Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always use safe call to read for regular files and block devices on unix if the RTS is multi-threaded, regardless of O_NONBLOCK #166

Closed
Kleidukos opened this issue May 11, 2023 · 30 comments
Labels
approved Approved by CLC vote

Comments

@Kleidukos
Copy link
Member

Kleidukos commented May 11, 2023

Hello CLC :)

From the original bug GHC uses O_NONBLOCK on regular files, which has no effect, and blocks the runtime

GHC is trying to use O_NONBLOCK on regular files, which cannot work and will block when used through unsafe foreign calls like that.

The thread is fairly detailed, and quite interesting to read entirely

Current patch lives at : https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7986

@arybczak has provided benchmarks:

unknown@electronics io-test $ cabal run -w ghc-9.4.4 io-test -- --csv baseline.csv
Resolving dependencies...
Build profile: -w ghc-9.4.4 -O1
In order, the following will be built (use -v for more details):
 - io-test-1.0 (exe:io-test) (first run)
Configuring executable 'io-test' for io-test-1.0..
Preprocessing executable 'io-test' for io-test-1.0..
Building executable 'io-test' for io-test-1.0..
[1 of 1] Compiling Main             ( src/Main.hs, /home/unknown/Programowanie/haskell/io-test/dist-newstyle/build/x86_64-linux/ghc-9.4.4/io-test-1.0/x/io-test/build/io-test/io-test-tmp/Main.o )
[2 of 2] Linking /home/unknown/Programowanie/haskell/io-test/dist-newstyle/build/x86_64-linux/ghc-9.4.4/io-test-1.0/x/io-test/build/io-test/io-test
All
  read
    2048:  OK (1.16s)
      386  ms ±  19 ms
    4096:  OK (1.09s)
      363  ms ± 5.6 ms
    8192:  OK (1.07s)
      355  ms ±  29 ms
    16384: OK (0.93s)
      308  ms ± 3.4 ms
    32768: OK (0.89s)
      295  ms ± 3.2 ms
    65536: OK (0.86s)
      288  ms ± 4.3 ms

All 6 tests passed (6.01s)
unknown@electronics io-test $ cabal run io-test -- --baseline baseline.csv 
Resolving dependencies...
Build profile: -w ghc-9.4.4.20230216 -O1
In order, the following will be built (use -v for more details):
 - io-test-1.0 (exe:io-test) (first run)
Configuring executable 'io-test' for io-test-1.0..
Preprocessing executable 'io-test' for io-test-1.0..
Building executable 'io-test' for io-test-1.0..
[1 of 1] Compiling Main             ( src/Main.hs, /home/unknown/Programowanie/haskell/io-test/dist-newstyle/build/x86_64-linux/ghc-9.4.4.20230216/io-test-1.0/x/io-test/build/io-test/io-test-tmp/Main.o )
[2 of 2] Linking /home/unknown/Programowanie/haskell/io-test/dist-newstyle/build/x86_64-linux/ghc-9.4.4.20230216/io-test-1.0/x/io-test/build/io-test/io-test
All
  read
    2048:  OK (1.33s)
      442  ms ± 9.7 ms, 14% more than baseline
    4096:  OK (1.26s)
      419  ms ± 5.1 ms, 15% more than baseline
    8192:  OK (1.21s)
      403  ms ±  14 ms, 13% more than baseline
    16384: OK (1.00s)
      332  ms ± 3.3 ms,  7% more than baseline
    32768: OK (0.91s)
      304  ms ± 3.7 ms,  3% more than baseline
    65536: OK (0.87s)
      291  ms ± 5.4 ms,       same as baseline

All 6 tests passed (6.58s)

with the following comment:

My opinion is that it doesn't matter how fast the current code is if it's incorrect. The assumption that file reads are always fast is clearly incorrect, especially nowadays where a lot of code runs on third party instances with restricted amount of storage IOPS like aws.
Anyone who does this has the potential to run into serious trouble that will be almost impossible to debug.
EDIT: also, this isn't a theoretical concern. I've run into this problem, struggled with fixing the root cause for 2 weeks as suddenly nothing made sense anymore (because logging within the application also became unreliable), then an unrelated fix that reduced the amount of IO activity made the problem go away and I stumbled onto #15153 randomly while browsing the issue tracker (and only because @nh2 refreshed the ticket by posting a MR) a couple months later.


From a personal standpoint, I can attest that in industrial environments that make use of network-based storage like AWS, this is a very important fix (disclaimer: All the companies I've worked at in the recent years use such a type of storage for applicative servers).

It must be noted that this is happening in parallel of a rewrite of the I/O Manager with io_uring, so while this is a fix for a bug, it is not bound to be a permanent solution.

@mixphix
Copy link
Collaborator

mixphix commented May 11, 2023

The actual merge request is located here. It's a one-line change that adds a check to see if the file is a RegularFile. This seems like a strict improvement to me, but I'd like to hear other members' thoughts!

@mixphix
Copy link
Collaborator

mixphix commented May 11, 2023

I am a little concerned about the interplay with the rewrite of the I/O manager. @bgamari could you explain what the intent and effect that will have on the base library, if any?

@bgamari
Copy link

bgamari commented May 11, 2023

@mixphix, I suspect that you mean the io-uring backend in particular. Using O_NONBLOCK in the existing I/O paths should not jeopardize the ability to introduce an io-uring backend. Moreover, such a backend would have the advantage of not suffering from the overhead introduced by O_NONBLOCK.

@mixphix
Copy link
Collaborator

mixphix commented May 11, 2023

Yes, thanks. That's reass"uring". :)

@Bodigrim
Copy link
Collaborator

As someone who uses Haskell to access slow network drives, I'm utterly in favor.

@hasufell
Copy link
Member

I'd like to get input from @Rufflewind and @Mistuke

Note that other packages such as file-io/unix etc may need patching. Otherwise we create subtle differences across the ecosystem.

E.g.: https://github.com/hasufell/file-io/blob/master/System/File/Posix.hs

@treeowl
Copy link

treeowl commented May 13, 2023

Could someone explain the problem with O_NONBLOCK and slow network drives?

@Kleidukos
Copy link
Member Author

Kleidukos commented May 13, 2023

@treeowl it is extensively explained in the ticket's thread on Gitlab.

@Rufflewind
Copy link
Member

I don't think turning off O_NONBLOCK is the proper solution.

I think the real bug is on this line:

  | isNonBlocking fd = unsafe_read

It's probably a bad idea to assume that O_NONBLOCK is meaningful. The POSIX spec only defines the semantics of O_NONBLOCK on certain kinds of files. Implementations like Linux seem to treat it as a hint, honored where possible and ignored if not.

I think the proper fix should be to remove this hack from readRawBufferPtr and introduce an alternate (opt-in) API for the rare case where the user wants to avoid the unsafe foreign overhead.


On a separate matter, it might be worth also diving into the unix library and make sure that blockable foreign calls are marked as safe. For example, I noticed openAt is unsafe, but I believe it's possible for openAt to block.

@Ericson2314
Copy link
Contributor

I am a little concerned about the interplay with the rewrite of the I/O manager.

Yeah architecturally, I few this as an implementation detail of the I/O manager, which may be in base not rts but is (to me) still morally part of the runtime.

This makes it a funny thing to consider here, procedurally.


It is possible there could be some "pre IO manager" IO primitives that are more directly exposing what the OS provides, unvarnished. These would be very dangerous to use in most code, but also great building blocks for writing the IO manager in the first place.

That strikes me as a good architecture, and then those "pre IO manager" IO primitives would be squarely in CLC remit since they are regular public library functions, with little implementation wiggle room left by their "public" behavior.

@Bodigrim
Copy link
Collaborator

Bodigrim commented May 26, 2023

@treeowl it is extensively explained in the ticket's thread on Gitlab.

@Kleidukos @arybczak @nh2 could you please provide a self-contained summary, which can serve as a proposal? Precisely because GitLab tickets are extremely extensive. It's not like an average Haskell developer has any idea about O_NONBLOCK at all. Please include reflection on suggestions from #166 (comment) and https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7986#note_498171.

@Bodigrim Bodigrim added the awaits-proposal Discussion has not resulted in a formal proposal yet label May 26, 2023
@Bodigrim
Copy link
Collaborator

Bodigrim commented Jun 19, 2023

Since no one got time to prepare a summary of the issue, here is mine.


Haskell FFI allows to specify the safety level. According to the Haskell Report 2010, in a single-threaded environment:

A safe call is less efficient, but guarantees to leave the Haskell system in a state that allows callbacks from the external code. In contrast, an unsafe call, while carrying less overhead, must not trigger a callback into the Haskell system.

This is however quite benign: in the most common case, when FFI connects Haskell to an independent low-level C library, there are no callbacks. The situation gets worse in a multi-threaded environment: if one thread executes an unsafe call and a global GC is trigerred by another thread, then all threads get blocked until the call returns. Quoting GHC User Guide:

In this situation, the garbage collector cannot proceed, and this can lead to performance issues that often appear under high load, as other threads are more active and thus more prone to trigger global garbage collection.

This means that if you need to make a foreign call to a function that takes a long time or potentially blocks, then you should mark it safe and use -threaded. Some library functions make such calls internally; their documentation should indicate when this is the case.

On the other hand, a foreign call to a function that is guaranteed to take a short time, and does not call back into Haskell can be marked unsafe. This works both for the single-threaded and the multi-threaded runtime. When considering what “a short time” is, a foreign function that does comparable work to what Haskell code does between each heap allocation (not very much), is a good candidate.

A common pattern is to define both safe and unsafe FFI bindings and switch between them depending on the size of payload or RTS configuration. Here is an example from bytestring:

isValidUtf8 :: ShortByteString -> Bool
isValidUtf8 sbs@(unSBS -> ba#) = accursedUnutterablePerformIO $ do
  let n = length sbs
  -- Use a safe FFI call for large inputs to avoid GC synchronization pauses
  -- in multithreaded contexts.
  -- This specific limit was chosen based on results of a simple benchmark, see:
  -- https://github.com/haskell/bytestring/issues/451#issuecomment-991879338
  -- When changing this function, also consider changing the related function:
  -- Data.ByteString.isValidUtf8
  i <- if n < 1000000 || not (isPinned ba#)
     then cIsValidUtf8 ba# (fromIntegral n)
     else cIsValidUtf8Safe ba# (fromIntegral n)
  IO (\s -> (# touch# ba# s, () #))
  return $ i /= 0

-- We import bytestring_is_valid_utf8 both unsafe and safe. For small inputs
-- we can use the unsafe version to get a bit more performance, but for large
-- inputs the safe version should be used to avoid GC synchronization pauses
-- in multithreaded contexts.

foreign import ccall unsafe "bytestring_is_valid_utf8" cIsValidUtf8
  :: ByteArray# -> CSize -> IO CInt

foreign import ccall safe "bytestring_is_valid_utf8" cIsValidUtf8Safe
  :: ByteArray# -> CSize -> IO CInt

Getting closer to the topic, base uses the same approach when it comes to reading files. Namely, on Unix System.Posix.Internals defines two versions of the same system call read(2):

foreign import capi unsafe "HsBase.h read"
   c_read :: CInt -> Ptr Word8 -> CSize -> IO CSsize

foreign import capi safe "HsBase.h read"
   c_safe_read :: CInt -> Ptr Word8 -> CSize -> IO CSsize

They are used further in GHC.IO.FD:

readRawBufferPtr :: String -> FD -> Ptr Word8 -> Int -> CSize -> IO Int
readRawBufferPtr loc !fd !buf !off !len
#if defined(javascript_HOST_ARCH)
  = fmap fromIntegral . uninterruptibleMask_ $
    throwErrnoIfMinus1 loc (c_read (fdFD fd) (buf `plusPtr` off) len)
#else
  | isNonBlocking fd = unsafe_read -- unsafe is ok, it can't block
  | otherwise    = do r <- throwErrnoIfMinus1 loc
                                (unsafe_fdReady (fdFD fd) 0 0 0)
                      if r /= 0
                        then read
                        else do threadWaitRead (fromIntegral (fdFD fd)); read
  where
    do_read call = fromIntegral `fmap`
                      throwErrnoIfMinus1RetryMayBlock loc call
                            (threadWaitRead (fromIntegral (fdFD fd)))
    read        = if threaded then safe_read else unsafe_read
    unsafe_read = do_read (c_read (fdFD fd) (buf `plusPtr` off) len)
    safe_read   = do_read (c_safe_read (fdFD fd) (buf `plusPtr` off) len)
#endif

The idea is that we use unsafe_read defined via c_read when RTS is single-threaded, because no harm can be caused: there are no other threads to block. Otherwise we use safe_read defined via c_safe_read when RTS is multi-threaded, because reading a file can take a long time and we do not want every other thread to be blocked on this.

But if isNonBlocking fd holds then we force unsafe_read independently of RTS configuration . The field isNonBlocking of FD record is set by GHC.IO.FD.mkFD. And high-level API such as System.IO.openFile always strives to open files in non-blocking mode.

On the lowest level non-blocking mode is implemented by passing O_NONBLOCK flag to open(2):

O_NONBLOCK or O_NDELAY
              When possible, the file is opened in nonblocking mode.
              Neither the open() nor any subsequent I/O operations on
              the file descriptor which is returned will cause the
              calling process to wait.
              ...
              Note that this flag has no effect for regular files and
              block devices; that is, I/O operations will (briefly)
              block when device activity is required, regardless of
              whether O_NONBLOCK is set.
              ...

While regular files cannot block indefinitely as other types of file descriptors such as sockets and pipes can, they always block the caller for a brief amount of time: you cannot read a file from a floppy drive or network share without delay, and no amount of O_NONBLOCK or other flags can change this physical limitation.


The proposed patch attempts to fix the issue at the level of GHC.IO.FD.mkFD:

-                fdIsNonBlocking = fromEnum is_nonblock
+                fdIsNonBlocking = fromEnum (is_nonblock && fd_type /= RegularFile)

That's not totally bulletproof: after all users can create FDs on their own or change fdIsNonBlocking as they wish, because fields of FD are publicly exposed. But, well, if clients desperately want to override the default behaviour, we should not stand in their way.

An alternative approach would be to perform the check in readRawBufferPtr (and same in readRawBufferPtrNoBlock and writeRawBufferPtr), but at that point fd_type :: IODeviceType is no longer accessible and we do not want to call System.Posix.Internals.fdStat on each read. It's equally cumbersome to perform the check in higher-level API, again because of IODeviceType not being at hand.

Thus I think that the proposers made a reasonable choice, best possible without major refactoring. One nitpick is that block devices should also be excluded (see man excerpt above):

-                fdIsNonBlocking = fromEnum is_nonblock
+                fdIsNonBlocking = fromEnum (is_nonblock && fd_type /= RegularFile && fd_type /= RawDevice)

I would also appreciate improved documentation for fdIsNonBlocking, educating users about its meaning and effect. openFileBlocking / withFileBlocking / hGetBufNonBlocking / hPutBufNonBlocking also would need a documentation update, describing their precise meaning.


@Kleidukos @arybczak @nh2 could you please indicate that you are still interested in the proposal? I'm somewhat worried that I had to prepare the long text above myself, I'm not a proposer after all. If there is no active champion to respond to the feedback within two weeks, we'll have little choice other than close as abandoned.

@nh2
Copy link
Member

nh2 commented Jun 22, 2023

@Bodigrim Yes, excellent summary!

Some detail remarks:

While regular files cannot block indefinitely as other types of file descriptors such as sockets and pipes can, they always block the caller for a brief amount of time: you cannot read a file from a floppy drive or network share without delay

Where "brief" can in practice, and today's real-world industry settings be

  • 10 milliseconds for a spinning hard drive seek, or
  • 2 seconds for a spinning hard drive that's under load from other requests, or
  • 2 seconds for a network share far away, or
  • 20 hours for a network share currently undergoing maintenance or having a network disconnect (file systems such as Ceph, sshfs, or other FUSE based network file systems block opening files when the connection is down),

Hit any of those for multiple files concurrently, and the Haskell program freezes for a while. Thus motivating concretely why this proposal exists.

@Bodigrim
Copy link
Collaborator

@nh2 if you update the MR adding fd_type /= RawDevice and improving documentation around *NonBlocking functions, we'll be able to proceed with a vote.

@Bodigrim
Copy link
Collaborator

I put up a new, slightly extended MR https://gitlab.haskell.org/ghc/ghc/-/merge_requests/11338. Besides some pieces of documentation, the only material change is one line:

-                fdIsNonBlocking = fromEnum is_nonblock
+                fdIsNonBlocking = fromEnum (is_nonblock && fd_type /= RegularFile && fd_type /= RawDevice)

For anyone new to this discussion, the summary of the issue is available above, at #166 (comment).

I'll change hats and trigger a vote shortly.

@Bodigrim
Copy link
Collaborator

Actually, given that this is a non-trivial piece of code and it would be unfortunate to spoil multiyear efforts by calling a vote too early, let me ask for non-binding opinions first.

Dear CLC members, could you please read #166 (comment) and opine on https://gitlab.haskell.org/ghc/ghc/-/merge_requests/11338?
@tomjaguarpaw @parsonsmatt @angerman @velveteer @hasufell @mixphix

@angerman
Copy link

@Bodigrim thank you for the excellent summary! I currently have trouble finding a common case where this would break existing code and behaviour, and other than potentially reducing stalls, I can't find any. As such I'm in favour of this change.
+1

@velveteer
Copy link
Contributor

The performance regression is somewhat concerning, no? Is Marlow's comment (https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7986#note_502536) being considered in this proposal as well?

Otherwise, I am in favor of this.

@Bodigrim
Copy link
Collaborator

The performance regression is somewhat concerning, no?

We are essentially fixing a DDoS vulnerability (= a single slow file I/O can render a multithreaded program unresponsible), so a modest performance regression is acceptable. We'd better get it correct first, optimize constant factor later.

Is Marlow's comment (https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7986#note_502536) being considered in this proposal as well?

My view is that such optimization could be a subject of a future proposal. It's not straightforward to implement: at least one has to extend data FD with a new field keeping type of a descriptor (pipe / socket or regular file). This itself may trigger non-trivial performance consequences and further investigation. I don't really have an expertise in this area; I just stepped up to refresh the MR because original proposers were seemingly inactive.

@Rufflewind
Copy link
Member

at least one has to extend data FD with a new field keeping type of a descriptor (pipe / socket or regular file).

If we ever go with that approach, I would recommend adding a SafeOrUnsafe field rather than a FileType field. I do not think it was ever a good idea to try and guess SafeOrUnsafe from FileType to begin with. I think the users should make that choice explicitly based on their use case, with Safe always a safe albeit slow default.

@Bodigrim
Copy link
Collaborator

Any more non-binding opinions? @tomjaguarpaw @mixphix @parsonsmatt @hasufell

@hasufell
Copy link
Member

hasufell commented Sep 30, 2023

I'm +1

The performance regression can be fixed later. I'm kind of baffled by the current behavior and wasn't aware of it.


As a note: the title is confusing. This has technically nothing to do with passing O_NONBLOCK to open, but whether we use safe or unsafe call to read, which is about how Haskell deals with FFI. And the patch does not remove the O_NONBLOCK flags from open. They're still passed to the FFI functions.

Edit: as indicated in my MR comment I'd change the title to: Always use safe call to read for regular files and block devices on unix if the RTS is multi-threaded, regardless of O_NONBLOCK

@hasufell
Copy link
Member

A common pattern is to define both safe and unsafe FFI bindings and switch between them depending on the size of payload or RTS configuration. Here is an example from bytestring

RTS configuration for this sounds like a good way to allow people to opt out of the changed behavior or tweak it?

@nh2
Copy link
Member

nh2 commented Sep 30, 2023

the title is confusing. This has technically nothing to do with passing O_NONBLOCK to open, but whether we use safe or unsafe call to read

@hasufell You are right. That's my fault: I orignally wrote the commit message Do not use O_NONBLOCK on regular files, when in fact the giving of O_NONBLOCK is unchanged. The message should be fixed as you suggest.

@Kleidukos Kleidukos changed the title Do not use O_NONBLOCK on regular files Always use safe call to read for regular files and block devices on unix if the RTS is multi-threaded, regardless of O_NONBLOCK Oct 1, 2023
@Bodigrim
Copy link
Collaborator

Bodigrim commented Oct 5, 2023

Dear CLC members, let's vote on https://gitlab.haskell.org/ghc/ghc/-/merge_requests/11338/diffs to fix a DDoS vulnerability (= a single slow file I/O, e. g., network shared drive, can render an entire multithreaded program unresponsive). This patch is backwards-compatible and does not introduce new API.

@hasufell @tomjaguarpaw @parsonsmatt @mixphix @velveteer @angerman


+1 from me.

@angerman
Copy link

angerman commented Oct 6, 2023

+1, as it doesn't break existing code. I see the concern for potential change in runtime behavior. That does however rather lead me to believe we should have an additional exploit api to let the end user chose rather than rely on a heuristic.

@hasufell
Copy link
Member

hasufell commented Oct 6, 2023

+1

1 similar comment
@velveteer
Copy link
Contributor

+1

@mixphix
Copy link
Collaborator

mixphix commented Oct 8, 2023

+1

@Bodigrim
Copy link
Collaborator

Bodigrim commented Oct 8, 2023

Thanks all, that’s enough votes to approve unconditionally.

Indeed this area may benefit from a more invasive contribution, refactoring data FD along the lines of the discussion above (+ impact assessment, + migration policy). An enthusiastic contributor is welcome :)

pull bot pushed a commit to sysfce2/ghc that referenced this issue Oct 8, 2023
@Bodigrim Bodigrim closed this as completed Oct 9, 2023
@Bodigrim Bodigrim added approved Approved by CLC vote and removed awaits-proposal Discussion has not resulted in a formal proposal yet labels Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Approved by CLC vote
Projects
None yet
Development

No branches or pull requests