Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cooldown before reconnecting on expected disconnect #397

Merged
merged 3 commits into from
Mar 6, 2023

Conversation

bdraco
Copy link
Member

@bdraco bdraco commented Mar 6, 2023

If the esp gets an OTA or requests a disconnect because its rebooting, we would reconnect too quickly before the device had shutdown accepting socket connections and end up trying to talk to a device that was moments away from rebooting which would delay reconnecting once the device came back up

If the esp gets an OTA or requests a disconnect because its
rebooting, we would reconnect too quickly before the device
had shutdown accepting socket connections and end up trying
to talk to a device that was moments away from rebooting which
would delay reconnecting once the device came back up
If the esp gets an OTA or requests a disconnect because its
rebooting, we would reconnect too quickly before the device
had shutdown accepting socket connections and end up trying
to talk to a device that was moments away from rebooting which
would delay reconnecting once the device came back up
@bdraco
Copy link
Member Author

bdraco commented Mar 6, 2023

2023-03-05 14:32:31.086 DEBUG (MainThread) [aioesphomeapi.connection] officenew89proxy @ 192.168.107.51: Got message of type <class 'api_pb2.DisconnectRequest'>: 
2023-03-05 14:32:31.086 DEBUG (MainThread) [aioesphomeapi.connection] 192.168.107.51: Sending <class 'api_pb2.DisconnectResponse'>: 
2023-03-05 14:32:31.087 DEBUG (MainThread) [aioesphomeapi._frame_helper] Sending frame e5242fe52a58bcf13d816ed4bc085ae634b45e41
2023-03-05 14:32:31.087 DEBUG (MainThread) [aioesphomeapi.connection] Cleaning up connection to officenew89proxy @ 192.168.107.51
2023-03-05 14:32:31.088 DEBUG (MainThread) [aioesphomeapi.connection] Cleaning up connection to officenew89proxy @ 192.168.107.51
2023-03-05 14:32:31.089 INFO (MainThread) [aioesphomeapi.reconnect_logic] Processing expected disconnect from ESPHome API for officenew89proxy @ 192.168.107.51
2023-03-05 14:32:34.094 DEBUG (MainThread) [aioesphomeapi.connection] officenew89proxy @ 192.168.107.51: Connecting to 192.168.107.51:6053 (AddrInfo(family=<AddressFamily.AF_INET: 2>, type=<SocketKind.SOCK_STREAM: 1>, proto=6, sockaddr=IPv4Sockaddr(address='192.168.107.51', port=6053)))
2023-03-05 14:32:36.085 DEBUG (MainThread) [aioesphomeapi.connection] officenew89proxy @ 192.168.107.51: Opened socket to 192.168.107.51:6053 (AddrInfo(family=<AddressFamily.AF_INET: 2>, type=<SocketKind.SOCK_STREAM: 1>, proto=6, sockaddr=IPv4Sockaddr(address='192.168.107.51', port=6053)))
2023-03-05 14:32:36.088 DEBUG (MainThread) [aioesphomeapi._frame_helper] Sending frame 
2023-03-05 14:32:36.096 DEBUG (MainThread) [aioesphomeapi._frame_helper] Sending frame 008e8295c5975df2f5c3350d1703ac3c9be8ea3fc99b5782a3654a91d1aee18d7a7471a5e209cf64d001484c7ab89b67fe
2023-03-05 14:32:36.156 DEBUG (MainThread) [aioesphomeapi._frame_helper] Starting handshake...
2023-03-05 14:32:36.157 DEBUG (MainThread) [aioesphomeapi._frame_helper] Handshake complete

@bdraco
Copy link
Member Author

bdraco commented Mar 6, 2023

reconnect is way faster now for wifi ones as well because we don't have to fail first


2023-03-05 14:36:08.943 DEBUG (MainThread) [aioesphomeapi.connection] m5stackatomproxy @ 192.168.106.43: Got message of type <class 'api_pb2.DisconnectRequest'>: 
2023-03-05 14:36:08.944 DEBUG (MainThread) [aioesphomeapi.connection] 192.168.106.43: Sending <class 'api_pb2.DisconnectResponse'>: 
2023-03-05 14:36:08.944 DEBUG (MainThread) [aioesphomeapi._frame_helper] Sending frame e30d7119fe7ca135859ebc3e2dec48f0759e1068
2023-03-05 14:36:08.944 DEBUG (MainThread) [aioesphomeapi.connection] Cleaning up connection to m5stackatomproxy @ 192.168.106.43
2023-03-05 14:36:08.946 DEBUG (MainThread) [aioesphomeapi.connection] Cleaning up connection to m5stackatomproxy @ 192.168.106.43
2023-03-05 14:36:08.946 INFO (MainThread) [aioesphomeapi.reconnect_logic] Processing expected disconnect from ESPHome API for m5stackatomproxy @ 192.168.106.43
2023-03-05 14:36:08.948 WARNING (MainThread) [switchbot.devices.device]  (C6:D5:B5:C1:2A:22): Device unexpectedly disconnected; RSSI: -64
2023-03-05 14:36:11.954 DEBUG (MainThread) [aioesphomeapi.connection] m5stackatomproxy @ 192.168.106.43: Connecting to 192.168.106.43:6053 (AddrInfo(family=<AddressFamily.AF_INET: 2>, type=<SocketKind.SOCK_STREAM: 1>, proto=6, sockaddr=IPv4Sockaddr(address='192.168.106.43', port=6053)))
2023-03-05 14:36:18.760 DEBUG (MainThread) [aioesphomeapi.connection] m5stackatomproxy @ 192.168.106.43: Opened socket to 192.168.106.43:6053 (AddrInfo(family=<AddressFamily.AF_INET: 2>, type=<SocketKind.SOCK_STREAM: 1>, proto=6, sockaddr=IPv4Sockaddr(address='192.168.106.43', port=6053)))
2023-03-05 14:36:18.762 DEBUG (MainThread) [aioesphomeapi._frame_helper] Sending frame 
2023-03-05 14:36:18.788 DEBUG (MainThread) [aioesphomeapi._frame_helper] Sending frame 00ed5d6401c91f2944817854561554f88da5728804b0f945fdffb78346c8e8ef71cd575bf2bdaca7d340480d9ea680d190
2023-03-05 14:36:18.860 DEBUG (MainThread) [aioesphomeapi._frame_helper] Starting handshake...
2023-03-05 14:36:18.861 DEBUG (MainThread) [aioesphomeapi._frame_helper] Handshake complete
2023-03-05 14:36:18.862 DEBUG (MainThread) [aioesphomeapi.connection] 192.168.106.43: Sending <class 'api_pb2.HelloRequest'>: client_info: "Home Assistant 2023.4.0.dev0"
api_version_major: 1

@bdraco
Copy link
Member Author

bdraco commented Mar 6, 2023

Much cleaner on the log watcher as well

[14:39:48][W][api.connection:083]: aioesphomeapi (192.168.107.5): Connection closed
[14:39:48][D][api.connection:920]: Home Assistant 2023.4.0.dev0 (192.168.107.5): Connected successfully
[14:39:48][D][time:045]: Synchronized time: 2023-03-05 14:39:48
[14:40:03][D][switch:013]: 'officenew89proxy Ethernet Restart' Turning ON.
[14:40:03][D][switch:056]: 'officenew89proxy Ethernet Restart': Sending state OFF
[14:40:03][I][restart:016]: Restarting device...
[14:40:03][I][app:134]: Rebooting safely...
INFO Processing expected disconnect from ESPHome API for officenew89proxy.local
WARNING Disconnected from API
INFO Successfully connected to officenew89proxy.local
[14:40:09][D][api:102]: Accepted 192.168.107.5
[14:40:09][D][api.connection:920]: Home Assistant 2023.4.0.dev0 (192.168.107.5): Connected successfully
[14:40:09][D][time:045]: Synchronized time: 2023-03-05 14:40:09

@bdraco
Copy link
Member Author

bdraco commented Mar 6, 2023

and a power pull still reconnects right away


2023-03-05 14:41:33.933 DEBUG (MainThread) [aioesphomeapi.connection] 192.168.106.43: Sending <class 'api_pb2.PingRequest'>: 
2023-03-05 14:41:33.934 DEBUG (MainThread) [aioesphomeapi._frame_helper] Sending frame ece0b1aab88f675763deb0c243d0135cf65aed9a
2023-03-05 14:41:33.958 WARNING (MainThread) [aioesphomeapi.connection] m5stackatomproxy @ 192.168.106.43: Connection error occurred: [Errno 54] Connection reset by peer
2023-03-05 14:41:33.959 DEBUG (MainThread) [aioesphomeapi.connection] Cleaning up connection to m5stackatomproxy @ 192.168.106.43
2023-03-05 14:41:33.959 INFO (MainThread) [aioesphomeapi.reconnect_logic] Processing unexpected disconnect from ESPHome API for m5stackatomproxy @ 192.168.106.43
2023-03-05 14:41:33.963 DEBUG (MainThread) [aioesphomeapi.connection] m5stackatomproxy @ 192.168.106.43: Connecting to 192.168.106.43:6053 (AddrInfo(family=<AddressFamily.AF_INET: 2>, type=<SocketKind.SOCK_STREAM: 1>, proto=6, sockaddr=IPv4Sockaddr(address='192.168.106.43', port=6053)))
2023-03-05 14:41:33.969 DEBUG (MainThread) [aioesphomeapi.connection] m5stackatomproxy @ 192.168.106.43: Opened socket to 192.168.106.43:6053 (AddrInfo(family=<AddressFamily.AF_INET: 2>, type=<SocketKind.SOCK_STREAM: 1>, proto=6, sockaddr=IPv4Sockaddr(address='192.168.106.43', port=6053)))
2023-03-05 14:41:33.970 DEBUG (MainThread) [aioesphomeapi._frame_helper] Sending frame 
2023-03-05 14:41:33.982 DEBUG (MainThread) [aioesphomeapi._frame_helper] Sending frame 0081f04cfa3187e5fbe175c0e6d88f92545ba8f53ba73e158db89f7bfaf5458a6720064a8c0680cb02c6072dfb02cf9a42
2023-03-05 14:41:34.056 DEBUG (MainThread) [aioesphomeapi._frame_helper] Starting handshake...
2023-03-05 14:41:34.057 DEBUG (MainThread) [aioesphomeapi._frame_helper] Handshake complete
2023-03-05 14:41:34.058 DEBUG (MainThread) [aioesphomeapi.connection] 192.168.106.43: Sending <class 'api_pb2.HelloRequest'>: client_info: "Home Assistant 2023.4.0.dev0"
api_version_major: 1
api_version_minor: 7

@bdraco bdraco merged commit 51d581d into main Mar 6, 2023
@bdraco bdraco deleted the cooldown_expected_disconnect branch March 6, 2023 04:54
@nagyrobi
Copy link
Member

nagyrobi commented Mar 6, 2023

Did you try this on:

  • ESP8266
  • ESP32 via WiFi
  • ESP32 via Ethernet

I think 8266 needs different treatment...

@bdraco
Copy link
Member Author

bdraco commented Mar 6, 2023

Tried on all 3 of those. They all behave the same. Why would the 8266 be different?

@bdraco
Copy link
Member Author

bdraco commented Mar 6, 2023

This is from an s31 I did an OTA with:

2023-03-06 00:38:35.467 DEBUG (MainThread) [aioesphomeapi.connection] maui_s31_c @ 192.168.107.206: Got message of type <class 'api_pb2.DisconnectRequest'>:
2023-03-06 00:38:35.467 DEBUG (MainThread) [aioesphomeapi.connection] Cleaning up connection to maui_s31_c @ 192.168.107.206
2023-03-06 00:38:35.468 DEBUG (MainThread) [aioesphomeapi.connection] Cleaning up connection to maui_s31_c @ 192.168.107.206
2023-03-06 00:38:35.468 INFO (MainThread) [aioesphomeapi.reconnect_logic] Processing expected disconnect from ESPHome API for maui_s31_c @ 192.168.107.206
2023-03-06 00:38:38.471 DEBUG (MainThread) [aioesphomeapi.connection] maui_s31_c @ 192.168.107.206: Connecting to 192.168.107.206:6053 (AddrInfo(family=<AddressFamily.AF_INET: 2>, type=<SocketKind.SOCK_STREAM: 1>, proto=6, sockaddr=IPv4Sockaddr(address='192.168.107.206', port=6053)))
2023-03-06 00:38:51.023 DEBUG (MainThread) [aioesphomeapi.reconnect_logic] maui_s31_c @ 192.168.107.206: Triggering reconnect because of received mDNS record record[ptr,in,_esphomelib._tcp.local.]=4500/4499,maui_s31_c._esphomelib._tcp.local.
2023-03-06 00:38:53.521 DEBUG (MainThread) [aioesphomeapi.connection] maui_s31_c @ 192.168.107.206: Opened socket to 192.168.107.206:6053 (AddrInfo(family=<AddressFamily.AF_INET: 2>, type=<SocketKind.SOCK_STREAM: 1>, proto=6, sockaddr=IPv4Sockaddr(address='192.168.107.206', port=6053)))
2023-03-06 00:38:53.532 DEBUG (MainThread) [aioesphomeapi.connection] maui_s31_c @ 192.168.107.206: Got message of type <class 'api_pb2.HelloResponse'>: api_version_major: 1
server_info: "maui_s31_c (esphome v2023.3.0-dev)"
name: "maui_s31_c"
2023-03-06 00:38:53.532 DEBUG (MainThread) [aioesphomeapi.connection] maui_s31_c @ 192.168.107.206: Successfully connected ('maui_s31_c (esphome v2023.3.0-dev)' API=1.7)
2023-03-06 00:38:53.551 DEBUG (MainThread) [aioesphomeapi.connection] maui_s31_c @ 192.168.107.206: Got message of type <class 'api_pb2.ConnectResponse'>:
2023-03-06 00:38:53.551 INFO (MainThread) [aioesphomeapi.reconnect_logic] Successfully connected to maui_s31_c @ 192.168.107.206
2023-03-06 00:38:53.572 DEBUG (MainThread) [aioesphomeapi.connection] maui_s31_c @ 192.168.107.206: Got message of type <class 'api_pb2.DeviceInfoResponse'>: name: "maui_s31_c"
2023-03-06 00:38:53.608 DEBUG (MainThread) [aioesphomeapi.connection] maui_s31_c @ 192.168.107.206: Got message of type <class 'api_pb2.ListEntitiesBinarySensorResponse'>: object_id: "sonos_button"
unique_id: "maui_s31_cbinary_sensorsonos_button"
2023-03-06 00:38:53.647 DEBUG (MainThread) [aioesphomeapi.connection] maui_s31_c @ 192.168.107.206: Got message of type <class 'api_pb2.ListEntitiesBinarySensorResponse'>: object_id: "sonos_status"
unique_id: "maui_s31_cbinary_sensorsonos_status"

@bdraco
Copy link
Member Author

bdraco commented Mar 6, 2023

Just did another 8266 to be sure it behaves the same

2023-03-06 05:44:24.715 DEBUG (MainThread) [aioesphomeapi.connection] maui_s31_b @ 192.168.106.235: Got message of type <class 'api_pb2.DisconnectRequest'>: 
2023-03-06 05:44:24.715 DEBUG (MainThread) [aioesphomeapi.connection] Cleaning up connection to maui_s31_b @ 192.168.106.235
2023-03-06 05:44:24.716 DEBUG (MainThread) [aioesphomeapi.connection] Cleaning up connection to maui_s31_b @ 192.168.106.235
2023-03-06 05:44:24.716 INFO (MainThread) [aioesphomeapi.reconnect_logic] Processing expected disconnect from ESPHome API for maui_s31_b @ 192.168.106.235
2023-03-06 05:44:27.719 DEBUG (MainThread) [aioesphomeapi.connection] maui_s31_b @ 192.168.106.235: Connecting to 192.168.106.235:6053 (AddrInfo(family=<AddressFamily.AF_INET: 2>, type=<SocketKind.SOCK_STREAM: 1>, proto=6, sockaddr=IPv4Sockaddr(address='192.168.106.235', port=6053)))
2023-03-06 05:44:42.087 DEBUG (MainThread) [aioesphomeapi.reconnect_logic] maui_s31_b @ 192.168.106.235: Triggering reconnect because of received mDNS record record[ptr,in,_esphomelib._tcp.local.]=4500/4499,maui_s31_b._esphomelib._tcp.local.
2023-03-06 05:44:42.773 DEBUG (MainThread) [aioesphomeapi.connection] maui_s31_b @ 192.168.106.235: Opened socket to 192.168.106.235:6053 (AddrInfo(family=<AddressFamily.AF_INET: 2>, type=<SocketKind.SOCK_STREAM: 1>, proto=6, sockaddr=IPv4Sockaddr(address='192.168.106.235', port=6053)))
2023-03-06 05:44:42.795 DEBUG (MainThread) [aioesphomeapi.connection] maui_s31_b @ 192.168.106.235: Got message of type <class 'api_pb2.HelloResponse'>: api_version_major: 1
server_info: "maui_s31_b (esphome v2023.3.0-dev)"
name: "maui_s31_b"
2023-03-06 05:44:42.795 DEBUG (MainThread) [aioesphomeapi.connection] maui_s31_b @ 192.168.106.235: Successfully connected ('maui_s31_b (esphome v2023.3.0-dev)' API=1.7)
2023-03-06 05:44:42.816 DEBUG (MainThread) [aioesphomeapi.connection] maui_s31_b @ 192.168.106.235: Got message of type <class 'api_pb2.ConnectResponse'>: 
2023-03-06 05:44:42.817 INFO (MainThread) [aioesphomeapi.reconnect_logic] Successfully connected to maui_s31_b @ 192.168.106.235
2023-03-06 05:44:42.833 DEBUG (MainThread) [aioesphomeapi.connection] maui_s31_b @ 192.168.106.235: Got message of type <class 'api_pb2.DeviceInfoResponse'>: name: "maui_s31_b"
2023-03-06 05:44:42.879 DEBUG (MainThread) [aioesphomeapi.connection] maui_s31_b @ 192.168.106.235: Got message of type <class 'api_pb2.ListEntitiesBinarySensorResponse'>: object_id: "master_av_button"

@nagyrobi
Copy link
Member

nagyrobi commented Mar 6, 2023

8266 needs more time after OTA due to different flash layout...

Is this also in your testing custom component?

@bdraco
Copy link
Member Author

bdraco commented Mar 6, 2023

8266 needs more time after OTA due to different flash layout...

The cooldown is only enough time for the esp to stop accepting new connections right before the reboot. 1s probably would be enough to make sure it's down enough to not accept connections but I did 3s just to be sure since there is no chance it will be back up by than.

Is this also in your testing custom component?

Yes it is

@nagyrobi
Copy link
Member

nagyrobi commented Mar 6, 2023

Testing...

@nagyrobi
Copy link
Member

nagyrobi commented Mar 6, 2023

I still see inconsistent behavior during OTA. Depending on flash size, network etc, it may take longer to upload the firmware on some nodes than the others. Some nodes go offline in HA when starting OTA, some during OTA somewhere in the middle, and some don't go offline at all (or only for a very short time, unnoticeable in GUI).

I guess on ESPHome side special care should be taken to deliberately say goodbye to HA before starting to write the flash.

There even is on_begin but it doesn't seem to be respected because of blocking...

@bdraco
Copy link
Member Author

bdraco commented Mar 6, 2023

I still see inconsistent behavior during OTA. Depending on flash size, network etc, it may take longer to upload the firmware on some nodes than the others. Some nodes go offline in HA when starting OTA, some during OTA somewhere in the middle, and some don't go offline at all (or only for a very short time, unnoticeable in GUI).

Is that from testing with this PR, or are these previous observations?

I would expect #395 improves the situation and reduces (probably not possible to eliminate all instances) the chance of premature disconnect during ota

@nagyrobi
Copy link
Member

nagyrobi commented Mar 6, 2023

Yes, with testing.

But can be considered as "cosmetic" as sooner or later the nodes do come back.

@bdraco
Copy link
Member Author

bdraco commented Mar 6, 2023

How long is the ota taking on the esps that are getting disconnected during the ota?

I expect anything that takes more than 10s before #395 to ota would have gone offline before. I expect you get at least 70s, maybe even the full 90s before you get a disconnect. I don't have any devices that take that long to ota so I don't have a good test case to validate that.

@nagyrobi
Copy link
Member

nagyrobi commented Mar 6, 2023

How long is the ota taking on the esps that are getting disconnected during the ota?

I feel it around 5 seconds...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants