Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: default TCP Keep-Alive interval causes significant power usage #48622

Open
ValdikSS opened this issue Sep 25, 2021 · 9 comments
Open

net: default TCP Keep-Alive interval causes significant power usage #48622

ValdikSS opened this issue Sep 25, 2021 · 9 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@ValdikSS
Copy link

ValdikSS commented Sep 25, 2021

Description

Golang's default TCP Keep-Alive is 15 seconds for both listening and connecting sockets.
Every time you use golang software, or connect to the website with long-polling/websockets running golang, your cell phone battery drains a lot quicker than it should.
The change has been originally introduced by:
https://go-review.googlesource.com/c/go/+/107196

There's a modern proxy application called V2ray, and it's available on Android as well. It's written in Go.
I noticed that my phone sends keep-alive packets every 3-5 seconds while keeping only 7 TCP sockets opened. The battery died rather quickly.

Current Golang version has two issues with TCP Keep-Alive interval:

  1. It is enabled by default on both listening and connecting sockets (dial.Dialer / net.Listener)
  2. It is very short (15 seconds), which creates unnecessary network load and makes cellphone radio module wake up much more frequently than it should
  3. dial.Dialer / net.Listener KeepAlive option changes both Keep-Alive time (TCP_KEEPIDLE) and Keep-Alive interval (TCP_KEEPINTVL) to the same value (can't be configured separately).

The latest item behavior is totally incorrect in my opinion. Linux uses 9 keep-alive probes of TCP_KEEPINTVL interval before closing the socket, so setting dial.Dialer KeepAlive to 300 seconds gives 50 minutes of actual socket hang detection.
If golang could set only TCP_KEEPIDLE and not touch TCP_KEEPINTVL, 300 second KeepAlive with the default Linux behavior (TCP_KEEPINTVL=75) would close the socket after ≈16 minutes, which is correct and expected. The latest behavior is widely used elsewhere.

Please note that golang also sets TCP_KEEPIDLE and TCP_KEEPINTVL by default for all listening and accepted sockets: not only golang clients, but also any clients connecting to golang servers are affected by short timeout.

What version of Go are you using (go version)?

$ go version
go version go1.17.1 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

Reproducible on any architecture and any OS.

What did you do?

Use dial.Dialer / net.Listener with default settings.

What did you expect to see?

Sane Keep-Alive values

What did you see instead?

Very short Keep-Alive period and inability to tune TCP_KEEPIDLE and TCP_KEEPINTVL separately.

@beoran
Copy link

beoran commented Sep 26, 2021

As a workaround, it should be possible to call syscall.Setsockoptint on the FD of the socket. Example here for different socket options: https://stackoverflow.com/questions/40544096/how-to-set-socket-option-ip-tos-for-http-client-in-go-language#40549614

@mknyszek mknyszek changed the title Default TCP Keep-Alive interval is very short (15s), drains cell phone battery net: default TCP Keep-Alive interval causes significant power usage Oct 4, 2021
@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 4, 2021
@mknyszek mknyszek added this to the Backlog milestone Oct 4, 2021
@mknyszek
Copy link
Contributor

mknyszek commented Oct 4, 2021

CC @neild

@ValdikSS
Copy link
Author

Hello, any updates, discussions, ideas?

@ianlancetaylor
Copy link
Contributor

The argument for why there is a default keep-alive value is #23459.

It is of course possible for a program to change the default, as I think you know.

Are you suggesting that we change the default behavior on android and ios? If so, what should we change it to?

@ValdikSS
Copy link
Author

ValdikSS commented Jan 30, 2022

The original idea was to have about 3 minutes for dead peer detection, but the current implementation does that suboptimal.

RIght now it is achieved by sending 10 keep-alive packets every 15 seconds (on Linux), after which, if the peer did not respond to any of them, the connection is considered broken after 2 minutes 30 seconds (15 + 15*9).

TCP_KEEPIDLE=15
TCP_KEEPINTVL=15
TCP_KEEPCNT=9

However, the same could be achieved without spending too much battery life, by configuring bigger initial timeout and smaller number of packets TCP_KEEPCNT. Like this:

TCP_KEEPIDLE=180
TCP_KEEPINTVL=15
TCP_KEEPCNT=2

This configuration begins probing the client only after 3 minutes (180 seconds) of inactivity on the socket. After 2 keep-alive packets with 15 seconds interval, or 3 minutes 30 seconds, the connection would be considered broken.

Are you suggesting that we change the default behavior on android and ios?

This should be changed globally, because changing it only on Android/iOS would still affect the devices connecting to the servers with low timeouts.

Please also take a look at the arguments here: #23459 (comment)
It states the problems I'm describing in this ticket, and it's from 2018.

I also believe that 3 minute timeout is still low, consider setting it to at least 5 minutes.

@ianlancetaylor
Copy link
Contributor

Note that I think there are some portability concerns here. For example, as far as I know OpenBSD does not permit setting these values individually for each socket.

@ianlancetaylor ianlancetaylor modified the milestones: Backlog, Go1.19 Jan 31, 2022
@ValdikSS
Copy link
Author

ValdikSS commented Jun 6, 2022

Regarding "why 5 minutes":

The mean time between two metro stations in Saint Petersburg, Russia is about 2 minutes. Cellular connectivity of my operator is available only on the stations, but not during the trip between stations. Since waiting time on the station is about 20-30 seconds only, my cellphone does not manage to find the network and connect to it all the time, and additional 2 minutes are needed to get to another station and connect there.
Despite this being a very selfish calculation method, I think it's a pretty realistic scenario of cellular connectivity loss on a global scale. It usually takes 2-3 minutes to move between connectivity spots on a train, metro, under the bridge, at least in Europe. It usually takes 2-3 minutes to go make a tea/coffee from the workplace to the kitchen in the office, and return back. I hope you got the idea.

@ianlancetaylor
Copy link
Contributor

Nothing happened for 1.19. Moving to 1.20.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.19, Go1.20 Jun 24, 2022
@gopherbot gopherbot modified the milestones: Go1.20, Go1.21 Feb 1, 2023
@Gr33nbl00d
Copy link

Gr33nbl00d commented Apr 26, 2023

Personally i think the main problem here is not the count but the problem that we are not able to maintain keep alive interval and idle timeout seperatly.

We also have a similar problem with tcp keep alive mechanism in golang. It is practically useless for us because of this issue since 2014 have a look here please too: #8328

Also my comment here:
#8328 (comment)

Being able to have a high idle time would prevent flodding the network draining bateries and having at the same time a low interval would help to dectect failed connections fast. That would be enough for us. Ofcourse haveing also possibility to control keep alive count would be good too but maybe this would not even necessary because you could adjust the interval according to your needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: No status
Development

No branches or pull requests

6 participants