net: Sendfile broken on some versions of Oracle Solaris when reading from tmpfs #20857
Recently, when attempting to setup a buildlet for build.golang.org on Oracle Solaris, it was discovered that TestSendfile was failing consistently.
But if I run ./all.bash on the buildlet system manually, that test passes:
If I ran that specific test manually, it passed:
Thinking it was somehow environment, or the like, I setup an SMF service the same as used for the buildlet (e.g. https://github.com/golang/build/blob/master/env/solaris-amd64/joyent/buildlet.xml), but instead of calling the buildlet executable, I had it run all.bash instead. That also worked.
So I created a git clone of current Go trunk in the buildlet's $HOME, and then did this (due to @bradfitz 's suggestions):
I then realized the key difference -- when I'm building and running manually, I do so using a "standard" ZFS-backed filesystem, but when the buildlet runs, it's using /tmp/workdir-host-solaris-oracle-shawn for its workdir; /tmp on Solaris is backed by tmpfs (i.e. ramdisk). The tmpfs filesystem on Solaris has subtle differences in behaviour that occasionally expose other bugs.
If I copy the src/net/testdata/Mark.Twain-Tom.Sawyer.txt to /tmp and then change the test to explicitly read the file from there, it fails every time.
...but if I use a ZFS-backed filesystem location, such as /var/tmp, it works every time:
As a workaround, I've manually set the buildlet's -workdir to /var/tmp/$host_key_name since it defaults to using /tmp/$host_key_name.
Finally, I tested on older releases of Solaris and this problem was not reproduceable, which suggests that this is likely an OS bug. The OS bug is actively being investigated now. I will report back as to whether this is an OS bug or a Go bug and then either close or submit a fix as appropriate.
The text was updated successfully, but these errors were encountered:
So as far as I can tell, this was just a subtle logic error in my original fix for sendfile in #13892. Notably, the code wrongly assumed that if EAGAIN/EINTR was encountered that a partial write occurred. In retrospect, that was a silly assumption. Instead, just like any other platform, we should try again, but unlike other platforms, we do need to check for the possibility of a partial write. This fix should be backwards compatible with any variant of Solaris that Go supports.