Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random disconnecting, won't connect anymore #2346

Closed
mrjaros opened this issue Feb 22, 2019 · 29 comments
Closed

Random disconnecting, won't connect anymore #2346

mrjaros opened this issue Feb 22, 2019 · 29 comments
Labels
Category: Stabiliy Things that work, but not as long as desired Category: Wifi Related to the network connectivity Status: Fixed Commit has been made, ready for testing Status: Needs Info Needs more info before action can be taken

Comments

@mrjaros
Copy link

mrjaros commented Feb 22, 2019

Summarize of the problem/feature request

node disconnecting in random time after reset/turn on. After some time, it does not connect anymore. After pressing physical reset button, node works again until next occurrence.
This behavior occurs only in my home network (wifi-enabled cable modem), in my workplace the problem does not exist.

Expected behavior

node should work stable and be accessible all the time.

Steps to reproduce

  1. come to my home
  2. switch on nodemcu with espeasy firmware
  3. watch it crash

System configuration

Hardware:
I've tested this issue on two devices, both gave me the same results. Devices vendors:

ESP Easy version:
ESPEasy_mega-20190121

ESP Easy settings/screenshots:

Rules or log data

https://pastebin.com/th2fbmR6

@TD-er
Copy link
Member

TD-er commented Feb 22, 2019

Has the WiFi a fixed channel, or is it allowed to hop to any channel?
Please try with a fixed WiFi channel.

@mrjaros
Copy link
Author

mrjaros commented Feb 23, 2019

thank you. I set the 2.4G channel to fixed (11) and 5G left unchanged (channel automatic). I will let know if it worked

@TD-er TD-er added Status: Needs Info Needs more info before action can be taken Category: Stabiliy Things that work, but not as long as desired Category: Wifi Related to the network connectivity labels Feb 23, 2019
@mrjaros
Copy link
Author

mrjaros commented Feb 25, 2019

Hi, I've left the node running for weekend and unfortunately I needed to reset it twice. I mean changing channel to fixed mode did not helped.

@TD-er
Copy link
Member

TD-er commented Feb 25, 2019

OK, thanks for testing at least.
I can't see in your paste-bin logs what build you're running.
Can you give from the sysinfo page the build info?

@mrjaros
Copy link
Author

mrjaros commented Feb 25, 2019

Thank you for your attention. Will do it today evening CET as now I have no access to my home network.

@mrjaros
Copy link
Author

mrjaros commented Feb 25, 2019

Build:⋄
20103 - Mega
Libraries:⋄
ESP82xx Core 2_4_2, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3 PUYA support
GIT version:⋄
mega-20190212
Plugins:⋄
46 [Normal]
Build time:⋄
Feb 12 2019 03:19:20
Binary filename:⋄
ESP_Easy_mega-20190212_normal_ESP8266_4096.bin

Unit: 0
Uptime: 0 days 0 hours 1 minutes
Load: 18.00% (LC=6032)
Free Mem: 19064 (17128 - sendContentBlocking)
Free Stack: 3568 (2128 - sendContentBlocking)
Boot: Manual reboot (17)
Reset Reason: External System
Network ❔
Wifi: 802.11N (RSSI -45 dB)
IP config: DHCP
IP / subnet: 192.168.0.248 / 255.255.255.0
GW: 192.168.0.1
Client IP: 192.168.0.234
DNS: 62.179.1.60 / 62.179.1.61
Allowed IP Range: 192.168.0.0 - 192.168.0.255
STA MAC: A0:20:A6:17:A0:64
AP MAC: A2:20:A6:17:A0:64
SSID: virus (38:43:7D:F0:F6:8D)
Channel: 11
Connected: 1 m 13 s
Last Disconnect Reason: (1) Unspecified
Number reconnects: 0
Firmware
Build:⋄ 20103 - Mega
Libraries:⋄ ESP82xx Core 2_4_2, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3 PUYA support
GIT version:⋄ mega-20190212
Plugins:⋄ 46 [Normal]
Build Md5: 883bbe44b9c9e91ebfa89a90adfb2c47
Md5 check: passed.
Build time:⋄ Feb 12 2019 03:19:20
Binary filename:⋄ ESP_Easy_mega-20190212_normal_ESP8266_4096.bin
System Status
Syslog Log Level: None
Serial Log Level: None
Web Log Level: None
ESP board
ESP Chip ID: 1548388 (0x17A064)
ESP Chip Freq: 80 MHz
ESP Board Name: PLATFORMIO_ESP12E
Storage
Flash Chip ID: Vendor: 0xE0 Device: 0x4016
Flash Chip Real Size: 4096 kB
Flash IDE Size: 4096 kB
Flash IDE speed: 40 MHz
Flash IDE mode: DIO
Flash Writes: 0 daily / 0 boot
Sketch Size: 778 kB (2292 kB free)
SPIFFS Size: 934 kB (860 kB free)

@TD-er
Copy link
Member

TD-er commented Feb 25, 2019

In the advanced settings, almost at the bottom, I added a new setting to force B/G only wifi network.
That's probably already present in your build too.
Can you test if that will make things better?
Also there is a test version present in the last build where it is using core 2.6.0
This core library has a lot of fixes.
It is almost the same as core 2.5.0 at this moment (since core 2.5.0 has been released a few days ago), but that version uses SDK 3.0.0, which does have a lot of other issues.
The test builds I added using core 2.6.0 are there in 2 versions:

  • SDK 2.2.1 (same as you're running now)
  • SDK 3.0.0

Could you also test those to see if these issues may be related to either SDK or some issues which may or may not be fixed in core 2.5.0/2.6.0

@mrjaros
Copy link
Author

mrjaros commented Feb 25, 2019

Okay. I've checked that here is no such option in my build. So I will try latest build now and mess a little with possible solutions you proposed. I will test single solution at a time to make sure we find real solution. Since every test take about one day, I will keep you updated during whole process.

@TD-er
Copy link
Member

TD-er commented Feb 25, 2019

Just to give you an idea what you're looking for:
image

@mrjaros
Copy link
Author

mrjaros commented Feb 26, 2019

Okay, I've just came back from work and my nodemcu is 'dead' again. In this test I just updated fw to last build we were talking about earlier. Now I restarted it and checked option "Restart WiFi on lost conn.". I will let you know if this worked tommorrow.

@mrjaros
Copy link
Author

mrjaros commented Feb 27, 2019

So far so good. I have nearly 24 hours up and running now. And it looks like it really got restarted automagically, cause I see only 56 minutes uptime on nodemcu info page.

Please give me few more days to confirm this solution resloved the issue before closing

@TD-er
Copy link
Member

TD-er commented Feb 27, 2019

Just to make it clear what you're doing.

  • "Restart WiFi on lost conn" should only stop the WiFi transceiver and restart it including all WiFi initialization steps also performed on reboot.
  • "Connection Failure Threshold" should perform a reboot of the node when the set N connection attempts have failed. (N = 0 is disable this feature)

So which one are you trying to test?

@mrjaros
Copy link
Author

mrjaros commented Feb 28, 2019

I've just checked "Restart WiFi on lost conn". Sadly, the device is not available in the network again. Should I force b/g, set connection failure threshold, or start debugging via serial (as I did when I collected logs last time)?

@TD-er
Copy link
Member

TD-er commented Feb 28, 2019

When using force B/G, the wifi seems to be a little more robust with low RSSI.

@mrjaros
Copy link
Author

mrjaros commented Feb 28, 2019

Signal strenght is not an issue here, as mcu distance from the router is about 1 meter and there is no physical obstacles between them. They are placed on the same shelf.

@GogaPit
Copy link

GogaPit commented Mar 1, 2019

Hi TD-er, I had a similar problem. Turning on "restart Wi-Fi on lost conn" seemed to help. 3 days all well. In one of my esp8266 happened last night reset the “Uptime” in the “System Info”. I think it's because of the Wi-Fi failure. All espeasy settings are saved and the esp web interface works. Where can I read about other sections of “Special and Experimental Settings”? Thank you!

@TD-er
Copy link
Member

TD-er commented Mar 1, 2019

Special and Experimental Settings

  • Fixed IP Octet: Sets the last byte of the IP address to this value, regardless of what IP is given using DHCP (all other settings received via DHCP will be used)
  • WD I2C Address: The Watchdog timer can be accessed via I2C, but I am not sure what can be set or read when accessing it.
  • Use SSDP: Is disabled for now since it is causing crashes. SSDP can be used to help autodiscovery of a node. For example Windows uses it to find hosts on a network.
  • Connection Failure Threshold: Number of failed network connect attempts before issuing a reboot (0 = disabled)
  • Force WiFi B/G: Force the WiFi to use only 802.11-B or -G protocol (not -N)
  • Restart WiFi on lost conn.: Force a complete WiFi radio shutdown & restart when connection with access point is lost.
  • I2C ClockStretchLimit: I2C-bus.org - Clock Stretching ESPeasy wiki - Basics: The I2C Bus

@giig1967g
Copy link
Contributor

Hi @TD-er
did you also implemented the possibility to disable the power saving mode as per #1640 (comment)?

@GogaPit
Copy link

GogaPit commented Mar 1, 2019 via email

@TD-er
Copy link
Member

TD-er commented Mar 1, 2019

The name "octet" is confusing, but it just means 8 bits or 1 byte.
So if you receive 192.168.1.234 from your DHCP server and this value is set to "10", ten the used IP in your node is 192.168.1.10.
But since you're receiving more information from the DHCP server, like subnet mask / gateway / DNS, it may still be useful.
This allows a somewhat static IP in your network (use it with an 'octet' outside the range of the DHCP IPs) while still having set to DHCP.
So if you take the node to another network which does use 192.168.52.x then you will know it will be on 192.168.52.10 (when setting this value to "10")

@TD-er
Copy link
Member

TD-er commented Mar 1, 2019

Hi @TD-er
did you also implemented the possibility to disable the power saving mode as per #1640 (comment)?

Nope not yet.
But since I'm now working on power save mode (to improve stability), that's something I will now also look at.

@uzi18
Copy link
Contributor

uzi18 commented Mar 1, 2019

  • WD I2C Address: The Watchdog timer can be accessed via I2C, but I am not sure what can be set or read when accessing it.

I believe it is atmega mcu with special firmware to be used as Watchdog or/and I/O expander.
You need to send something eg. every 1s, to not reset esp by external cpu, when esp hang.
In facts we need to deal with opposite situations like to frequent resets by exceptions =)

@giig1967g
Copy link
Contributor

Hi @TD-er
did you also implemented the possibility to disable the power saving mode as per #1640 (comment)?

Nope not yet.
But since I'm now working on power save mode (to improve stability), that's something I will now also look at.

Excellent. Because I have a suspect that this could also affect the HW WD.
Will it be a flag (Enable/Disable power save mode)?

@TD-er
Copy link
Member

TD-er commented Mar 1, 2019

Will it be a flag (Enable/Disable power save mode)?

Yep, one to make the idle calls in the scheduler and one for the wifi sleep mode.

@mrjaros
Copy link
Author

mrjaros commented Mar 6, 2019

Looks like now it is working stable. I've tested it for almost a week and got no issue. I've checked "Restart WiFi on lost conn" and set connection failure limit to 2.

I think this issue is resolved, thank you for your assistance.

@TD-er
Copy link
Member

TD-er commented Mar 6, 2019

Just curious, since you're stating you tested it for a week now. (and the power save options were only merged yesterday)
What version/build were you testing with?
And using what settings?

@TD-er TD-er added the Status: Fixed Commit has been made, ready for testing label Mar 6, 2019
@mrjaros
Copy link
Author

mrjaros commented Mar 6, 2019

I use mega-20190225. I can send detailed info today evening CET 'cause I'm not able to reach my home network now. Settings which work for me are as I stated before: Restart on lost conn and connection failure limit = 2

@mrjaros
Copy link
Author

mrjaros commented Mar 6, 2019

I've found one spare mcu in my endless bag which I believe is configured exaclty as one I tested, so:

Unit: 0
Uptime: 0 days 0 hours 0 minutes
Load: 99.50% (LC=2)
Free Mem: 8984 (7000 - sendContentBlocking)
Free Stack: 3616 (2080 - sendContentBlocking)
Boot: Manual reboot (41)
Reset Reason: Software/System restart
Network ❔
Wifi: 802.11N (RSSI -60 dB)
IP config: DHCP
IP / subnet: 192.168.1.112 / 255.255.255.0
GW: 192.168.1.1
Client IP: 192.168.1.100
DNS: 192.168.1.1 / 192.168.1.1
Allowed IP Range: 192.168.1.0 - 192.168.1.255
STA MAC: 5C:CF:7F:B2:EB:11
AP MAC: 5E:CF:7F:B2:EB:11
SSID: GniazdoNajwredniejszychJezy (F4:DC:F9:47:F4:6F)
Channel: 4
Connected: 8763 ms
Last Disconnect Reason: (201) No AP found
Number reconnects: 0
Firmware
Build:⋄ 20103 - Mega
Libraries:⋄ ESP82xx Core 2.6.0-dev, NONOS SDK 3.0.0-dev(c0f7b44), LWIP: 2.1.2 PUYA support
GIT version:⋄ mega-20190225
Plugins:⋄ 76 [Normal] [Testing]
Build Md5: 798fe05fb52ee4162865020f2b9f
Md5 check: passed.
Build time:⋄ Feb 25 2019 03:21:35
Binary filename:⋄ ESP_Easy_mega-20190225_test_core_260_sdk3_alpha_ESP8266_4M.bin
System Status
Syslog Log Level: Debug More
Serial Log Level: Debug More
Web Log Level: None
ESP board
ESP Chip ID: 11725585 (0xB2EB11)
ESP Chip Freq: 80 MHz
ESP Board Name: PLATFORMIO_ESP12E
Storage
Flash Chip ID: Vendor: 0xC8 Device: 0x4016
Flash Chip Real Size: 4096 kB
Flash IDE Size: 4096 kB
Flash IDE speed: 40 MHz
Flash IDE mode: DIO
Flash Writes: 0 daily / 0 boot
Sketch Size: 930 kB (2140 kB free)
SPIFFS Size: 934 kB (860 kB free)

One thing that is disturbing is excessive load (I have many timeouts), but it is out of scope of this issue. I will try this particular device in my home network, eventually flash fresh build into it.

@TD-er
Copy link
Member

TD-er commented Mar 6, 2019

The load is always reported high at the beginning.
If you wait 30+ seconds, it will be reported normal.
At boot, it is mainly an issue of how it is computed.

@mrjaros mrjaros closed this as completed Mar 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Stabiliy Things that work, but not as long as desired Category: Wifi Related to the network connectivity Status: Fixed Commit has been made, ready for testing Status: Needs Info Needs more info before action can be taken
Projects
None yet
Development

No branches or pull requests

5 participants