New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: "async" file IO #6817

Open
dvyukov opened this Issue Nov 22, 2013 · 21 comments

Comments

Projects
None yet
9 participants
@dvyukov
Member

dvyukov commented Nov 22, 2013

Read/Write file operations must not consume threads, and use the polling similar to net
package.

This was raised several times before.

Here is a particular example with godoc:
https://groups.google.com/d/msg/golang-nuts/TeNvQqf4tO4/dskZuFH5QVYJ
@minux

This comment has been minimized.

Member

minux commented Nov 22, 2013

Comment 1:

what about stat syscalls? readdir?
one more reason to expose the poller?
@dvyukov

This comment has been minimized.

Member

dvyukov commented Nov 22, 2013

Comment 2:

> what about stat syscalls? readdir?
I don't know for now. Do you have any thoughts?
> one more reason to expose the poller?
Yes, it is related.
@minux

This comment has been minimized.

Member

minux commented Nov 22, 2013

Comment 3:

I don't think there is asynchronous stat syscall available, and IIRC, most
event based web servers take great pain to optimize the stat(2)-taking-up-a-thread
problem (e.g. dedicated stat(2) thread pools)
Similarly for readdir, is there a pollable version available?
I don't know if readdir/stat is contributing to the godoc problem, but I think they
might be a problem if the GOPATH is large enough.
@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 22, 2013

Comment 4:

The pollable version of readdir is getdents, which the syscall package already uses.
@bradfitz

This comment has been minimized.

Member

bradfitz commented Nov 23, 2013

Comment 5:

This continually bites me.  I have an interface that has both network and filesystem
implementations and the network one works great (firing off a bounded number of
goroutines: say, 1000) but then the filesystem implementation of the interface kills the
OS, and my code has to manually limit itself, which feels like a layering violation. 
The runtime should do the right thing for me.
runtime/debug.SetMaxThreads sets the max threads Go uses before it blows up.
If we can't do async filesystem IO everywhere (and I don't think we can), then I think
we should have something like runtime/debug.SetMaxFSThreads that controls the size of
the pool of thread doing filesystem operations but blocks instead of crashes.  That
means for the read/write syscalls, we'd have to know which fds are marked non-blocked
(network stuff) and which aren't.
Or we put all this capping logic in pkg/os, perhaps opt-in.
pkg os
func SetMaxFSThreads(n int)
@dvyukov

This comment has been minimized.

Member

dvyukov commented Nov 23, 2013

Comment 6:

Do you use read/write? Or more involved ops like readdir?
Can you create a repro for this issue?
@bradfitz

This comment has been minimized.

Member

bradfitz commented Nov 23, 2013

Comment 7:

I use read, write, open, close, readdir, stat, lstat.  godoc is a repro.  camlistore is
a repro.  My original Go bug report to Russ on May 5, 2010 was a repro.
I'll write something small, though, and attach it here.
@rsc

This comment has been minimized.

Contributor

rsc commented Dec 4, 2013

Comment 8:

Labels changed: added release-none, removed go1.3maybe.

@rsc

This comment has been minimized.

Contributor

rsc commented Dec 4, 2013

Comment 9:

Labels changed: added repo-main.

@dvyukov

This comment has been minimized.

Member

dvyukov commented Oct 27, 2014

Comment 10:

FTR, net.FileConn does not work for serial ports, etc. FileConn calls newFileFD which
does syscall.GetsockoptInt(fd, syscall.SOL_SOCKET, syscall.SO_TYPE) which will fail for
non-sockets.
@niemeyer

This comment has been minimized.

Contributor

niemeyer commented Oct 29, 2014

Comment 11:

Also FTR, the typical way to unblock a f.Read operation that is waiting for more data is
to f.Close, and the current implementation of these methods makes that process racy. An
implementation similar to the net package would also fix that.
@tv42

This comment has been minimized.

tv42 commented Feb 13, 2015

(This came up in a conversation today and I wanted to make sure people don't start off with incorrect assumptions.)

I want to be really clear on this: there is no such thing as regular file or dir I/O that wouldn't wait for disk on cache miss. I am not talking about serial ports or such here, but files and directories. Regular files are always "ready" as far as select(2) and friends are concerned, so technically they never "block", and "non-blocking" is the wrong word. But they may wait for actual disk IO to happen, and there is realistically no way to avoid that, in the general case (in POSIX/Linux).

The network poller / epoll has nothing to contribute to here. There is no version of read(2) and friends where the syscall would return early, without waiting for disk, if there's nothing cached. Go really has very little to do there.

People have been talking about extending Linux to implement non-waiting file IO (e.g. http://lwn.net/Articles/612483/ ) but that's not realistic today.

I don't see Go having much choice beyond threads doing syscalls. The real question in my mind is, is there a way to limit syscall concurrency to avoid swamping the CPU/OS with too many threads, while still avoiding deadlocks.

And just to minimize chances of confusion, file AIO ("Async I/O") is something very different, and not applicable to this conversation. It's a very restrictive API (actually, multiple), bypasses useful features like caching, and doesn't necessarily perform well at all.

@dvyukov

This comment has been minimized.

Member

dvyukov commented Feb 13, 2015

@tv42

This comment has been minimized.

tv42 commented Feb 13, 2015

@dvyukov io_submit is the Linux AIO API (as opposed to POSIX AIO). It's a separate codepath and dependent on the filesystem doing the right thing; the implementations have been problematic, and using aio introduces a whole bunch of risk. The original implementation assumed O_DIRECT and this air remains; non-O_DIRECT operation is even more problematic. O_DIRECT is not safe to use for generic file operations because others accessing the file will use buffer cache. Without O_DIRECT e.g. the generic version of io_submit falls back to synchronous processing. Some filesystems don't handle unaligned accesses well. In some circumstances (e.g. journaling details, file space not preallocated, etc), io_submit has to wait until the operation completes, instead of just submitting an async request; this tends to be more typical without O_DIRECT. The default limit for pending requests is only 128; after that io_submit starts blocking. Finally, io_submit only helps with the basic read/write workload, no open(2), stat(2) etc.

I'm not saying it won't work, but I also would not be surprised if a change moving file IO to io_submit got reverted within a few months.

@dvyukov

This comment has been minimized.

Member

dvyukov commented Feb 13, 2015

OK, then everybody should switch to Windows :)

@minux

This comment has been minimized.

Member

minux commented Feb 13, 2015

@gopherbot

This comment has been minimized.

gopherbot commented Feb 10, 2017

CL https://golang.org/cl/36799 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Feb 10, 2017

CL https://golang.org/cl/36800 mentions this issue.

gopherbot pushed a commit that referenced this issue Feb 13, 2017

net: refactor poller into new internal/poll package
This will make it possible to use the poller with the os package.

This is a lot of code movement but the behavior is intended to be
unchanged.

Update #6817.
Update #7903.
Update #15021.
Update #18507.

Change-Id: I1413685928017c32df5654ded73a2643820977ae
Reviewed-on: https://go-review.googlesource.com/36799
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>

gopherbot pushed a commit that referenced this issue Feb 15, 2017

os: use poller for file I/O
This changes the os package to use the runtime poller for file I/O
where possible. When a system call blocks on a pollable descriptor,
the goroutine will be blocked on the poller but the thread will be
released to run other goroutines. When using a non-pollable
descriptor, the os package will continue to use thread-blocking system
calls as before.

For example, on GNU/Linux, the runtime poller uses epoll. epoll does
not support ordinary disk files, so they will continue to use blocking
I/O as before. The poller will be used for pipes.

Since this means that the poller is used for many more programs, this
modifies the runtime to only block waiting for the poller if there is
some goroutine that is waiting on the poller. Otherwise, there is no
point, as the poller will never make any goroutine ready. This
preserves the runtime's current simple deadlock detection.

This seems to crash FreeBSD systems, so it is disabled on FreeBSD.
This is issue 19093.

Using the poller on Windows requires opening the file with
FILE_FLAG_OVERLAPPED. We should only do that if we can remove that
flag if the program calls the Fd method. This is issue 19098.

Update #6817.
Update #7903.
Update #15021.
Update #18507.
Update #19093.
Update #19098.

Change-Id: Ia5197dcefa7c6fbcca97d19a6f8621b2abcbb1fe
Reviewed-on: https://go-review.googlesource.com/36800
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
@manishrjain

This comment has been minimized.

manishrjain commented May 16, 2017

Just to add some numbers to the discussion, we're facing this problem as well.

Running fio on an Amazon EC2 i3.large instance, we're able to get 64K IOPS on a 4K block size, using 8 jobs, file size = 4G for random reads. (Other times, I've seen it go close to 100K IOPS)

We created a small Go program to do the exact same thing using Goroutines. And it doesn't budge above 20K IOPS. In fact, the throughput won't increase any further once the number of Goroutines reach the number of cores. This strongly indicates that Go is paying the cost of context switching, because of doing blocking read in every iteration of the loop.

Full Go code here: https://github.com/dgraph-io/badger-bench/blob/master/randread/main.go
go build . && ./randread --dir /mnt/data/fio --preads 6500000 --jobs <num-cores>

It seems like using an async IO is the only way to achieve IO throughput in Go. SSDs are able to push more and more throughput with every release; so there has to be a way in Go to realize that.

@bradfitz

This comment has been minimized.

Member

bradfitz commented May 16, 2017

@manishrjain, what fio command line are you comparing against?

Btw, your benchmark has a global mutex (don't use rand.Intn in goroutines if you want performance). That would show up if you look at contention profiles.

@manishrjain

This comment has been minimized.

manishrjain commented May 16, 2017

This is the fio command on my computer, and the output:

$ fio --name=randread --ioengine=libaio --iodepth=32 --rw=randread --bs=4k --direct=0 --size=4G --numjobs=4 --runtime=60 --group_reporting
randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
...
fio-2.19
Starting 4 processes
Jobs: 4 (f=4): [r(4)][100.0%][r=159MiB/s,w=0KiB/s][r=40.8k,w=0 IOPS][eta 00m:00s]
randread: (groupid=0, jobs=4): err= 0: pid=19525: Wed May 17 09:47:41 2017
   read: IOPS=43.8k, BW=171MiB/s (179MB/s)(10.3GiB/60001msec)
    slat (usec): min=2, max=13539, avg=90.15, stdev=98.95
    clat (usec): min=1, max=27856, avg=2829.90, stdev=708.48
     lat (usec): min=6, max=27873, avg=2920.05, stdev=724.33
    clat percentiles (usec):
     |  1.00th=[ 1512],  5.00th=[ 1816], 10.00th=[ 1992], 20.00th=[ 2224],
     | 30.00th=[ 2416], 40.00th=[ 2608], 50.00th=[ 2800], 60.00th=[ 2992],
     | 70.00th=[ 3184], 80.00th=[ 3408], 90.00th=[ 3696], 95.00th=[ 3920],
     | 99.00th=[ 4448], 99.50th=[ 4704], 99.90th=[ 7200], 99.95th=[ 8896],
     | 99.99th=[15168]
    lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 250=0.01%
    lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01%
    lat (msec) : 2=10.11%, 4=85.93%, 10=3.92%, 20=0.03%, 50=0.01%
  cpu          : usr=1.10%, sys=10.23%, ctx=1464338, majf=0, minf=153
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwt: total=2627651,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=171MiB/s (179MB/s), 171MiB/s-171MiB/s (179MB/s-179MB/s), io=10.3GiB (10.8GB), run=60001-60001msec

And the corresponding Go program:
go build . && ./randread --dir ~/diskfio --jobs 4 --num 1000000 --mode 1

I switched from using global rand to local rand, and it doesn't show up in block profiler or cpu profiler. Fio is getting 43.8K IOPS. My program in Go is giving me ~25K, checked via sar -d 1 -p (the Go program is reporting lesser than what I see via sar, so must be a flaw in my code somewhere).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment