Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

withTempDirectory doesn't always delete the temporary directory #3

Closed
bitc opened this issue Oct 28, 2016 · 4 comments
Closed

withTempDirectory doesn't always delete the temporary directory #3

bitc opened this issue Oct 28, 2016 · 4 comments

Comments

@bitc
Copy link

bitc commented Oct 28, 2016

withTempDirectory targetDir template =
  Exception.bracket
    (liftIO (createTempDirectory targetDir template))
    (liftIO . ignoringIOErrors . removeDirectoryRecursive)

This function uses bracket which splits it up into three stages:

  1. "acquire" (create the directory)
  2. "in-between" (user action)
  3. "release" (recursively delete the directory)

Consider the following scenario:

  • Stage 1 ("acquire") completes successfully.
  • Stage 2 ("user action") places many files inside the temporary directory and completes successfully.
  • Stage 3 begins: There are many files inside the temporary directory, and they are deleted one by one. But before they have all been deleted, an async exception occurs. Even though we are currently in a state of "masked" async exceptions (thanks to bracket), the individual file delete operations are "interruptible" and thus our mask will be pierced. The function will return before all of the temporary files have been deleted (and of course the temporary directory itself will also remain).

This is not good. "with-style" functions are expected to guarantee proper and complete clean up of their resources. And this is not just a theoretical issue: there is a significant likelihood that the problem can occur in practice, for example with a program that uses a temporary directory with many files and the user presses Ctrl-C.

I have 3 ideas:

  1. Make use of "uninterruptibleMask" during the cleanup. The ramifications of this need to be thought through...
  2. During the cleanup, fork another thread that does the grunt work. This thread can then be waited on so that we only return when everything has been deleted. If the wait is interrupted via an async exception then the work thread will continue to do its job in the background until completion.
  3. Declare this as a non-issue and leave everything the way it currently is. I am strongly against this. This is a genuine bug that must be addressed.

Idea (2) has the attractive quality of avoiding the use of scary "uninterruptibleMask". But it is dangerous because it can leave behind the temporary directory in unexpected ways. For example, with this approach, the trivial program main = withTempDirectory "/tmp" "tmp" (pure ()) might not delete the temporary directory if Ctrl-C is pressed just at the right moment.

I believe that idea (1) is the best solution. Note that, in a sense, the current code already uses "uninterruptibleMask" for each individual file delete operation (they are blocking FFI calls). So switching to one big scary "uninterruptibleMask" does not morally change anything about the interruptible-ness property of the function.

@UnkindPartition
Copy link
Owner

UnkindPartition commented Oct 28, 2016

"with-style" functions are expected to guarantee proper and complete clean up of their resources

They are, but unfortunately, that's impossible; see here.

That said, you bring up a very good point, which probably deserves its own section in that article.

I agree with your reasoning and would accept a pull request that implements (1) and documents this semantics in the haddocks.

@UnkindPartition
Copy link
Owner

Hey @bitc,

I added a test that check that withTempDirectory deletes its directory. The test passes without any changes to the code (such as uninterruptibleMask).

I think this because both c_unlink and c_rmdir are declared ffi-unsafe and are therefore uninterruptible.

If you disagree with my analysis, I'd be curious to see a test case that demonstrates this possibility.

@bitc
Copy link
Author

bitc commented Sep 13, 2017

I still believe that what I originally wrote still applies.

But:

  1. Maybe I'm wrong and things are Ok. When I have some free time I'll try to write up a real test case that demonstrates the issue.
  2. Even if the original issue is still lurking, it may be the right decision to just accept it as it is (for various reasons).

@bitc
Copy link
Author

bitc commented Sep 13, 2017

Looking at your test, I have a feeling that 100 temporary files may not be enough. Maybe try bumping that number up to a much much higher number?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants