New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A couple spots async exceptions aren't handled gracefully #187

Open
mgsloan opened this Issue Mar 19, 2017 · 15 comments

Comments

Projects
None yet
8 participants
@mgsloan

mgsloan commented Mar 19, 2017

Based on commercialhaskell/stack#3073 , I suspected hackage-security wasn't handling async exceptions quite right. Indeed, a search for SomeException led me to find the following spots where SomeException gets caught and not rethrown:

Relevant to the linked issue:

catchChecked (select m callback) $ \ex -> do

Part of hackage-repo-tool:

handle (throwIO . TarGzError (prettyTargetPath' opts pathTarGz)) $ do

I also noticed that the UpdateFailed exception constructor is never used -

@dcoutts

This comment has been minimized.

Member

dcoutts commented Mar 20, 2017

@mgsloan thanks.

@edsko looking at that use of catchChecked, I don't see how the Throws e => constraint is resolved. There doesn't seem to be any instance around.

@edsko

This comment has been minimized.

Contributor

edsko commented Mar 20, 2017

@edsko looking at that use of catchChecked, I don't see how the Throws e => constraint is resolved. There doesn't seem to be any instance around.

That's the whole point. https://www.well-typed.com/blog/2015/07/checked-exceptions/ .

@hvr

This comment has been minimized.

Member

hvr commented Mar 20, 2017

fwiw, I'd suggest adding a link to that blogpost to the Hackage.Security.Util.Checked module (which provides catchChecked) as the concept is really a little bit non-obvious to the non-initiated... :-)

@mgsloan

This comment has been minimized.

mgsloan commented Apr 3, 2017

It may also be good to have it wait for the lock to become available - commercialhaskell/stack#3055 (comment)

@domenkozar

This comment has been minimized.

domenkozar commented Feb 13, 2018

Could this be the case why commercialhaskell/stack#3055 happens even with just ctrl-c (sigint/sigterm)? I've seen it happen on CI, where I'd be surprised anything is calling SIGKILL.

@domenkozar

This comment has been minimized.

domenkozar commented Feb 13, 2018

Managed to reproduce it with:

$ rm -rf ~/.cabal
$ cabal update
# wait 15s
ctrl-c

$ cabal update
Downloading the latest package list from hackage.haskell.org
/home/ielectric/.cabal/packages/hackage.haskell.org/hackage-security-lock: createDirectory: already exists (File exists)
@nh2

This comment has been minimized.

Member

nh2 commented Feb 13, 2018

I think this is because of

-- | Attempt to create a filesystem lock in the specified directory
--
-- Given a file @/path/to@, we do this by attempting to create the directory
-- @//path/to/hackage-security-lock@, and deleting the directory again
-- afterwards. Creating a directory that already exists will throw an exception
-- on most OSs (certainly Linux, OSX and Windows) and is a reasonably common way
-- to implement a lock file.
withDirLock :: Path Absolute -> IO a -> IO a
withDirLock dir = bracket_ takeLock releaseLock
where
lock :: Path Absolute
lock = dir </> fragment "hackage-security-lock"
takeLock, releaseLock :: IO ()
takeLock = createDirectory lock
releaseLock = removeDirectory lock

So this seems to just create a plain directory to implement a lock. Any form of createDirectory >> removeDirectory as done there cannot survive SIGKILL or computer crashes. I would have expected this to be implemented with flock().

@hvr

This comment has been minimized.

Member

hvr commented Feb 13, 2018

@nh2 unfortunately flock() isn't portable (a problem we're having trouble with already in cabal; but I'm working on something to address that; once I'm done we can use it hackage-security too)

@domenkozar

This comment has been minimized.

domenkozar commented Feb 13, 2018

https://hackage.haskell.org/package/filelock-0.1.1.2/docs/System-FileLock.html works well on Windows as long as you don't use lock paths for anything.

@hvr

This comment has been minimized.

Member

hvr commented Feb 13, 2018

@domenkozar I'm aware of the prior art (and there's a lot of history regarding file/record locking in Unix...) :-)

@chrisdone

This comment has been minimized.

Member

chrisdone commented Feb 13, 2018

@nh2 unfortunately flock() isn't portable (a problem we're having trouble with already in cabal; but I'm working on something to address that; once I'm done we can use it hackage-security too)

@domenkozar I'm aware of the prior art :-)

So what are you working on to address this?

@domenkozar

This comment has been minimized.

domenkozar commented Feb 13, 2018

@domenkozar I'm aware of the prior art :-)

Ah sorry, I assume you don't due to your previous comment. I guess my implicit question was if we can use filelock to solve this properly and what would be the cons (besides obvious extra dependency) to fix this bug?

snoyberg added a commit to snoyberg/hackage-security that referenced this issue Feb 13, 2018

Demonstrate one aspect of issue haskell#187
Due to the nature of directory-based locking, if the process exits
before exception handlers are run, the directory is never removed and
the system remains in a "locked" state indefinitely. This can happen due
to SIGKILL or power failure, as well as a blocked thread when main
exits. This repro follows that last approach, forking a thread and then
letting main exit while that locked thread delays.

On my machine, I get the following output for running issue-187.sh:

$ ./issue-187.sh
+ rm -rf tmp
+ ghc --version
The Glorious Glasgow Haskell Compilation System, version 8.2.2
+ cabal --version
cabal-install version 2.0.0.1
compiled using version 2.0.1.1 of the Cabal library
+ cabal sandbox init
Writing a default package environment file to
/Users/michael/Documents/hackage-security/cabal.sandbox.config
Using an existing sandbox located at
/Users/michael/Documents/hackage-security/.cabal-sandbox
+ cabal install ./hackage-security
Resolving dependencies...
In order, the following will be installed:
hackage-security-0.5.2.2 (reinstall)
Warning: Note that reinstalls are always dangerous. Continuing anyway...
Notice: installing into a sandbox located at
/Users/michael/Documents/hackage-security/.cabal-sandbox
Configuring hackage-security-0.5.2.2...
Building hackage-security-0.5.2.2...
Installed hackage-security-0.5.2.2
+ cabal exec -- ghc -hide-all-packages -package hackage-security
-package base -package directory -Wall -Werror issue-187.hs
[1 of 1] Compiling Main             ( issue-187.hs, issue-187.o )
Linking issue-187 ...
+ ./issue-187
+ ./issue-187
issue-187:
/Users/michael/Documents/hackage-security/tmp/hackage-security-lock:
createDirectory: already exists (File exists)
@snoyberg

This comment has been minimized.

Contributor

snoyberg commented Feb 13, 2018

I've written a commit to demonstrate one failure mode for directory-based file locking:

snoyberg@ffb83f9

Quoting the commit message:

Due to the nature of directory-based locking, if the process exits
before exception handlers are run, the directory is never removed and
the system remains in a "locked" state indefinitely. This can happen due
to SIGKILL or power failure, as well as a blocked thread when main
exits. This repro follows that last approach, forking a thread and then
letting main exit while that locked thread delays.

Regarding the larger issue of async exception safety here: one possibility is to implement the same technique used in safe-exceptions in the Hackage.Security.Util.Checked module, namely:

  • Differentiate at the type level between sync and async exceptions using the SomeAsyncException type
  • Ensure that all exceptions passed to throwIO are synchronous
  • Disallow catchChecked, handleChecked, and tryChecked from catching any asynchronous exception.

Note that, AFAICT, the async exception safety issue is almost entirely orthogonal to the issue of using directory creation/removal as a form of locking.

snoyberg added a commit to snoyberg/hackage-security that referenced this issue Feb 13, 2018

Detect asynchronous exceptions via their types haskell#187
This commit uses the same async-exception detection mechanism as is used
by the safe-exceptions package, via checking if the given exception is
cast to a SomeAsyncException. (On older GHCs without SomeAsyncException,
it contains a hard-coded list of async exception types.) It then ensures
that:

* Throwing via throwChecked always generates a synchronous exception
* Catching via catchChecked (et al) never catches an asynchronous
  exception

Unfortunately, I don't currently have a reliable test case to ensure
that this fixes the problems described in haskell#187. Hopefully with this
patch available we can begin testing cabal-install and Stack against the
change and see if it resolves the issues.
@snoyberg

This comment has been minimized.

Contributor

snoyberg commented Feb 13, 2018

I've opened up #202 with the async exception detection logic I described above.

snoyberg added a commit to commercialhaskell/stack that referenced this issue Feb 13, 2018

Avoid async exception mishandling #3073
This commit SHOULD NOT BE MERGED TO master. It adds an extra-dep for
hackage-security from Hackage to work around #3073 and
haskell/hackage-security#187. Hopefully this
will be merged and released to Hackage.
@snoyberg

This comment has been minimized.

Contributor

snoyberg commented Feb 13, 2018

I can also report that, with my patch in #202, Stack no longer has the buggy behavior described in commercialhaskell/stack#3073 (see my comment commercialhaskell/stack#3073 (comment)).

snoyberg added a commit to snoyberg/hackage-security that referenced this issue Feb 13, 2018

Use file instead of dir locking haskell#187
This commit simply imports the code from the filelock package verbatim
into a subdirectory, filelock. Depending on filelock as an external
package instead would be more straightforward, but I'm not sure what the
rules for external dependencies are here.

snoyberg added a commit to snoyberg/hackage-security that referenced this issue Feb 13, 2018

Use file instead of dir locking haskell#187
This commit simply imports the code from the filelock package verbatim
into a subdirectory, filelock. Depending on filelock as an external
package instead would be more straightforward, but I'm not sure what the
rules for external dependencies are here.

hvr added a commit that referenced this issue Feb 14, 2018

Use file instead of dir locking #187 (#203)
* Use file instead of dir locking #187

This commit simply imports the code from the filelock package verbatim
into a subdirectory, filelock. Depending on filelock as an external
package instead would be more straightforward, but I'm not sure what the
rules for external dependencies are here.

* Switch to upstream filelock

Given that the extra dependency doesn't seem to be a problem, remove the
inlined code. If in fact the dependency should be avoided, just ignore
this commit and use the parent.

hvr added a commit that referenced this issue Feb 14, 2018

Detect asynchronous exceptions via their types #187 (#202)
* Detect asynchronous exceptions via their types #187

This commit uses the same async-exception detection mechanism as is used
by the safe-exceptions package, via checking if the given exception is
cast to a SomeAsyncException. (On older GHCs without SomeAsyncException,
it contains a hard-coded list of async exception types.) It then ensures
that:

* Throwing via throwChecked always generates a synchronous exception
* Catching via catchChecked (et al) never catches an asynchronous
  exception

Unfortunately, I don't currently have a reliable test case to ensure
that this fixes the problems described in #187. Hopefully with this
patch available we can begin testing cabal-install and Stack against the
change and see if it resolves the issues.

* Treat Timeout as an async exception too

* Remove exceptions not actually considered async

phadej added a commit to phadej/hackage-security that referenced this issue Feb 14, 2018

hvr added a commit that referenced this issue Feb 15, 2018

Merge pull request #207 from phadej/ghc-io-handle-lock
This effectively reverts the half-solution from #187 and replaces it with a strictly
better but still preliminary solution which avoids introducing regressions (i.e. #205).

This still needs to be addressed in a more principled way long-term.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment