Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout for TCP connect (IDFGH-1541) #3791

Closed
wants to merge 6 commits into from

Conversation

boarchuz
Copy link
Contributor

@boarchuz boarchuz commented Jul 17, 2019

Currently, the timeout_ms parameter is actually used for the SO_RCVTIMEO option rather than for the socket connection. Therefore, regardless of the timeout_ms value, connect() will block for a long time (~20 seconds) if a connection cannot be established (eg. the server is offline). This is very inefficient, especially in the common case of battery-powered devices connecting to a local server, where <100ms might be enough to determine if the server is online.

Efficiency aside, I think it's far more intuitive for a 'timeout_ms' parameter of 'tcp_connect' to actually be the connection timeout.

With this, tcp_connect will return an error if the socket is not write-able before timeout_ms expires. I've tested via esp_http_client.

  • It gets kinda if-if-iffy. Is "goto error" preferred style? (edit: updated to goto)
  • Also I'm unsure if there's anything that is dependent on SO_RCVTIMEO being set to timeout_ms here. I know that esp_http_client is fine (it passes timeout_ms separately to read) but tcp_connect might be called elsewhere. Should SO_RCVTIMEO still be timeout_ms?

@github-actions github-actions bot changed the title Timeout for TCP connect Timeout for TCP connect (IDFGH-1541) Jul 21, 2019
@boarchuz
Copy link
Contributor Author

With this config (where 192.168.0.123 is unreachable):

esp_http_client_config_t config = {
  .url = "http://192.168.0.123/",
  .timeout_ms = 1000,
}

On esp_http_client_perform() without changes: (~18 seconds to return)

D (1965) HTTP_CLIENT: Begin connect to: http://192.168.0.123:80
D (1967) TRANS_TCP: [sock=54],connecting to server IP:192.168.0.123,Port:80...
E (20222) HTTP_CLIENT: Connection failed, sock < 0

With changes: (~1000ms to return)

D (3550) HTTP_CLIENT: Begin connect to: http://192.168.0.123:80
D (3551) TRANS_TCP: [sock=56] Connecting to server. IP: 192.168.0.123, Port: 80
E (4554) TRANS_TCP: [sock=56] select() timeout
E (4555) HTTP_CLIENT: Connection failed, sock < 0

@david-cermak
Copy link
Collaborator

Thank you @boarchuz for sharing this useful udpate.
Yes, agree that the timeout argument in calling esp_transport_connect() may intuitevely indicate to be a connection timeout.

My concern is however the compatibility of existing applications that use esp_transport indirectly (e.g. via esp_http_client), where the only timeout configured is used for both connection and read/write operations. Thinking that the default value of 1000ms is
perfectly alright for a local server, but might not be always sufficient for internet/cloud backends, so it may introduce an intermittent connection issue to such applications after updating IDF version. To be on a safe side, perhaps a new config value could be introduced to setup connection timeout separately?

@boarchuz
Copy link
Contributor Author

Hi David,
I agree that esp_http_client should have a separate value (eg. connection_timeout_ms) since applications often have a clear need for very different timeouts for connecting and reading/writing (consider downloading a large file from a local server, for example).

I think that would be best addressed in a separate PR/issue though, especially since, as of 3.2.2 release, esp_http_client has a lot of other timing issues too. Namely that it is ignoring all other timeouts anyway (if a poll times out, it will keep retrying until the ~18 second socket timeout rather than immediately returning an error). There have been a lot of updates recently so I'll have to test it in its current state before raising those issues.

The 1000ms was an example, set in the config in the snippet above. It's not the default value. esp_http_client has a default 5000ms timeout.

I guess this PR is seeking to clarify whether tcp_connect should do what it says on the tin or if we're advised to fiddle with LWIP_TCP_SYNMAXRTX instead and leave this layer alone.

espressif-bot pushed a commit that referenced this pull request May 7, 2020
@Alvin1Zhang
Copy link
Collaborator

Thanks for contribution, please help refer to 0c7204e. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants