Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: conn.Read() methods seem to thrash CPU, please return on insufficient data #27315

Closed
matthalladay opened this issue Aug 28, 2018 · 21 comments

Comments

Projects
None yet
6 participants
@matthalladay
Copy link

commented Aug 28, 2018

  1. Version: Go 1.10
  2. Does it reproduce: I'm sure, this is more of a request than an issue
  3. Ubuntu 16.04 AMD64
  4. What did I do...
    It is likely that I missed something, but in implementing a network stream collector which sits and spins on arriving UDP messages (arrive every 40 microseconds) I see quite heavy CPU utilization.

This is using conn.SetReadDeadline with conn.Read. Note that the conn.SetReadDeadline is pulled out to a larger interval of every 25-50 packets rather than every packet. This same interval is used to insert Sleep() calls that allow the port to buffer packets for increased CPU efficiency. Somewhat complicated, but the end CPU utilization was around 1% CPU on Linux, down from utilizing most of a core.

Much of this is because when conn.Read() is called prior to a packet arriving the CPU spins on the read until data arrives, or until the ReadDeadline is hit. This would be much simpler with a method that returns on insufficient data (perhaps conn.TryRead()), or surfacing a conn.Length().

To this end I changed a line in the fd_unix.go file on my local install to achieve a conn.Read method that simple returns when data is unavailable. I understand that this would break backward compatibility, so am not suggesting a change to conn.Read(), but rather the addition of a method which allows for the non-blocking read call as this avoids burdening the kernel with repeated polling and in my experience results in fairly substantial CPU gains.

Additional benefit is that when working with Windows (assume a similar addition is possible there) - Sleep timeouts and timer precision are both much less granular.

I believe this addition will aid all streaming protocols and applications which want to maximize the number of streams and connections they are processing.

Inside fd_unix.go

// Read implements io.Reader.
func (fd *FD) Read(p []byte) (int, error) {
	if err := fd.readLock(); err != nil {
		return 0, err
	}
	defer fd.readUnlock()
	if len(p) == 0 {
		// If the caller wanted a zero byte read, return immediately
		// without trying (but after acquiring the readLock).
		// Otherwise syscall.Read returns 0, nil which looks like
		// io.EOF.
		// TODO(bradfitz): make it wait for readability? (Issue 15735)
		return 0, nil
	}
	if err := fd.pd.prepareRead(fd.isFile); err != nil {
		return 0, err
	}
	if fd.IsStream && len(p) > maxRW {
		p = p[:maxRW]
	}
	for {
		n, err := syscall.Read(fd.Sysfd, p)
		if err != nil {
			n = 0
			// cannot return from here as this results in many "resource temporarily unavailable" errors
			if err == syscall.EAGAIN && fd.pd.pollable() {
				if err = fd.pd.waitRead(fd.isFile); err == nil {
					//continue
					// replaced continue call with return n, err
					return n, err
					
				}
			}

			// On MacOS we can see EINTR here if the user
			// pressed ^Z.  See issue #22838.
			if runtime.GOOS == "darwin" && err == syscall.EINTR {
				continue
			}
		}
		err = fd.eofError(n, err)
		return n, err
	}
}

Am I missing some call which achieves a non-blocking read? Thanks for you time in reading this.

@ianlancetaylor ianlancetaylor changed the title conn.Read() methods seem to thrash CPU, please return on insufficient data net: conn.Read() methods seem to thrash CPU, please return on insufficient data Aug 28, 2018

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 28, 2018

What are you really trying to do? Why are you using SetReadDeadline? How many concurrent connections do you need to support?

Have you considered not using SetReadDeadline, but instead using an ordinary Read and sending the results on a channel? Then some other goroutine can use timers and read from the channel as it pleases.

@mattha78

This comment has been minimized.

Copy link

commented Aug 29, 2018

Thanks for the response. Answers below.
What are you really trying to do?
Developing a stream processing appliance. It accepts MHz sampling rate streams from multiple attached devices.

Why are you using SetReadDeadline?
It publishes an error if the connection receives no data for the set amount of time. SetReadDeadline is reset outside of an inner read and process loop. In profiling, it does not take much CPU to reset the Deadline every couple hundred packets.

How many concurrent connections do you need to support?
Requirements say 5 concurrent connections at the MHz rate. each connection with 9 channels, so 45 million samples per second, or a packet every ~5 us, each on different ports. The appliance also filters, triggers, indexes and archives the data, so it is important enough to optimize CPU during collection that I've spent several days off and on optimizing this collector.

Have you considered not using SetReadDeadline, but instead using an ordinary Read and sending the results on a channel?
The SetReadDeadline does not appear to be a factor right now. I am essentially using an ordinary Read, the results are pushed onto a circular buffer - channels are not performant enough to handle the data rates, especially with the multiple downstream readers.

Then some other goroutine can use timers and read from the channel as it pleases.
I don't believe timers would accomplish anything different that relinquishing to the scheduler via occasional Sleeps when the buffer is emptied.

I don't mind essentially forking the functionality from conn out into a new specialized UDP reader, but believe that my extreme scenario is showing an area where there is low hanging fruit for optimization. I believe many out there might be spinning up a Go routine to process a stream of data from a port. The obvious and simple way to occomplish this is with a read loop continually hitting the port with bocking conn.Read() calls. This uses quite a bit of CPU. In my case, where I can allow the buffer to fill for a millisecond, then process repeatedly, having a read call that returns immedietly when there is insufficient data simplifies the timing logic that would otherwise be needed to determine that the buffer is empty and polling is occurring.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 29, 2018

Thanks for the detailed reply. I suspect that for your use case you will be better off using syscall.Read. You want complete control over your descriptor; the poller is just adding overhead.

@matthalladay

This comment has been minimized.

Copy link
Author

commented Aug 29, 2018

My design when using a non-polled read is more transparent and CPU friendly. I'll look into syscall.Read, though quite a few of the internal helper functions are not exported so my initial thought is that this will require a bit of code to get it working. I haven't looked in detail yet though. Thanks for the quick replies. Though I am developing on Linux and targeting Linux we are keeping cross compilation as an option. In additional to simplifying timer logic across operating systems and platforms, a non-polled read might also reduce the large ntoskrnl.exe burden that we see on Windows - up to 5% per stream. I haven't tested this yet. If you think others might benefit and I can help, let me know.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 29, 2018

I personally would be extremely reluctant to add a non-polled (*net.UDPConn).Read to the standard library. That is encouraging a coding style that we do not recommend in Go. It would step away from the advantages of using an integrated poller for all network connections. It would have be plumbed through several layers of code. I do not think we should follow this path.

Looking back at your initial report, I noticed this:

Much of this is because when conn.Read() is called prior to a packet arriving the CPU spins on the read until data arrives, or until the ReadDeadline is hit.

That is not accurate. The code does not spin. SetReadDeadline is implemented by setting up a timer. The timer goroutine will sleep until the specified time, and then wake up the goroutine with the deadline. There is no spinning involved. Of course, if your deadline is very short, the overhead of the timer will indeed cause increased CPU use. Still, it might be interesting to see whether there is some inefficiency in the runtime package here.

@matthalladay

This comment has been minimized.

Copy link
Author

commented Aug 29, 2018

I didn't mean to imply that SetReadDeadline causes the CPU to spins, but rather requesting a Read when a packet is not yet available. The SetReadDeadline only happens rarely, it does not use any noticeable CPU unless set on every read, in which case it does, but I see no reason to set it every time unless using it to set a deadline for slowly arriving messages or a single response.

I appreciate the push to use syscall.Read, though initially I thought it would be a bit painful there are only a couple of line of code that required a change. There aren't a ton of examples so I'm pasting the gist below.
*Notes

  1. syscall.SetNonblock is required otherwise it is a blocking read
  2. I have not gone through the different return paths from syscall.Read, so ymmv on checking for <= 0 for n
  3. This should work for udp and tcp equally well
  4. I've only tested it briefly and only on Ubuntu 18.04
func (t *FTDV) run2(conn *net.UDPConn) {
	defer conn.Close()

	file, _ := conn.File()
	// handle error here
	fd := int(file.Fd())
	syscall.SetNonblock(fd, true) // otherwise it is a blocking read

	pkt := make([]byte, ftdvUDPPktSize)
...
	for {
		n, err := syscall.Read(fd, pkt)
		if n <= 0 { // returns -1 with "resource not available" as the error
			// relinquish, publish metrics, etc. likely that no more data is available
			continue
		}
		if err != nil {
			...
			// handle other errors - haven't looked to see how this corresponds
			// to 'n' so better processing may be needed here or better logic with
			// the 'n <= 0' call
			continue
		}
		// parsing logic here
		// at first I expect to potentially get the UDP packet header, but this wasn't the case
		// performs the same to adding in a return to the core conn.Read() function, thus
		// non-blocking used there as well?
	}
}

Though the logic is a little different between the two read methods I believe most of the CPU performance increase is in avoiding the polled read. Using a 1 ms sleep each time the buffer is emptied (either by time to process 25 packets with the polled read. Since each packet arrives at a precise 40 us interval, but the processing time is much faster than that, every 25 packets the time to process is compared against 350us. If it takes longer, then the read loop sleeps for 1 ms. In the case of the syscall.Read, whenever n <= 0 it sleeps for 1 ms.

syscall.Read - 7.5% of a core, 8 cores
conn.Read - 45.6% of a core, 8 cores

You're probably right regarding the integrated poller, etc. For my scenario, a non-blocking read would be used and having it exist within the fabric of the internals code would also be appreciated. This would be a Read that returns when there is no data available. There is no noticeable performance delta between syscall.Read and conn.Read with the continue being replace with a "return n, err".

A non-blocking read would also make it possible to use a pseudo read-time scheduler with a hot thread per work queue, though the strongest argument for it is the reduction in CPU usage. There are quite a few project in Go that stand up read processes on ports.

Thanks again and tell me if I can help with anything.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 29, 2018

I didn't mean to imply that SetReadDeadline causes the CPU to spins, but rather requesting a Read when a packet is not yet available.

The CPU doesn't spin there either, though. The goroutine sleeps waiting in the poller.

Glad that syscall.Read is working for you.

@matthalladay

This comment has been minimized.

Copy link
Author

commented Aug 29, 2018

Hopefully we can delete comments if my ignorance becomes too obvious. Looking at the fd_unix.go file (line 133), in the case where no data is available (potentially just when the connection is busy... not sure on this), the logic for (err != nil), (err == syscall.EAGAIN && fd.pd.pollable()), (err == nil) is taken over an over again. There does not appear to be any spot here where the process would sleep. When calling syscall.Read() directly without syscall.SetNonBlock(), the call only returns with data. When blocking is set to false this starts to return without data and with an error.

I guess the for loop in FD.Read() seems like the poller since this is where syscall.Read is iterated over, but I don't see a place where a sleep happens.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 29, 2018

In the EAGAIN case in internal/poll.(*FD).Read, the sleep occurs in the call to fd.pd.waitRead. It's this line: https://golang.org/src/internal/poll/fd_unix.go#L169 .

@matthalladay

This comment has been minimized.

Copy link
Author

commented Aug 30, 2018

Yes, I see now. If it is possible in the future to see an indication of whether waitRead is being hit I could use that rather than syscall.Read as an indication that the buffers are empty. Optionally some direct call that returns the current conn buffer level would also serve.

Thanks for the help.

@matthalladay

This comment has been minimized.

Copy link
Author

commented Sep 4, 2018

Update on syscall.Read in Go 1.11. Using the example code above the syscall.Reads will start failing after about a minute on "resource not available". A band-aid to this is to occasionally re-request the fd and set it to non-blocking, but this kind of starts to seem like magic. Do you know why the fd needs to be re-associated with the variable? Are syscalls not touching it in a way that the GC knows about it?

@matthalladay matthalladay reopened this Sep 4, 2018

@methane

This comment has been minimized.

Copy link
Contributor

commented Oct 26, 2018

I want Non-blocking version of Read (maybe, ReadOnce?) too.

In my case, I want to check TCP connection is closed or not, before sending packet on the connection.
See go-sql-driver/mysql#882.
If we can check "connection closed by server" before sending query, we can retry the query with new connection.

I tried to implement it with SyscallConn(). code

There are two problems:

  • syscall package is deprecated, and adding one more dependency (golang.org/x/sys) is hard decision for the project.
  • It's difficult to implement it in portable and right way.

There is another approach: use dedicated goroutine for reading.
But it increases CPU & memory usage. For request-response protocol, dedicated read goroutine is not idiomatic by performance point of view.
And it makes design complex and maintenance harder too.

@mikioh

This comment has been minimized.

Copy link
Contributor

commented Oct 27, 2018

@methane,

Please open a new issue, perhaps a proposal might be better. To me, your statement If we can check "connection closed by server" before sending a query, we can retry ... sounds like we want to have extra control between the user-surface API and the underlying transport protocol; for the trinity of API, transport protocol and IO mechanism. IIUC, this issue focuses on the combination of net.Conn interface and connectionless datagram-oriented transport protocols such as UDP, but your statement implies the use of connection-oriented transport protocols.

@troian

This comment has been minimized.

Copy link

commented Nov 4, 2018

Looks like related to #15735

@mattha78

This comment has been minimized.

Copy link

commented Nov 4, 2018

@troian

This comment has been minimized.

Copy link

commented Nov 5, 2018

I doubt #15735 allows now to check if any data in socket available to read. Though it is mail goal of that

@matthalladay

This comment has been minimized.

Copy link
Author

commented Nov 5, 2018

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Nov 5, 2018

I'm sorry, I don't understand what this issue is about now. As @mikioh suggested, if you have a problem using syscall.Read, please open a new issue describing that problem. Thanks.

@matthalladay

This comment has been minimized.

Copy link
Author

commented Nov 5, 2018

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Nov 5, 2018

Thanks for the note. We have no current plans for adding non-blocking I/O to net.Conn. Such a change should go through the proposal process described at https://golang.org/s/proposal, but I doubt it would be accepted. Non-blocking I/O without using a poller seems unlikely to scale to large numbers of descriptors.

@methane

This comment has been minimized.

Copy link
Contributor

commented Nov 6, 2018

I have created new issue: #28607

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.