Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in Shell.mkdir_p #300

Open
mroch opened this issue Aug 26, 2019 · 5 comments
Open

Race condition in Shell.mkdir_p #300

mroch opened this issue Aug 26, 2019 · 5 comments

Comments

@mroch
Copy link

mroch commented Aug 26, 2019

ocamlbuild/src/shell.ml

Lines 62 to 64 in 4ef4b18

let rec mkdir_p dir =
if sys_file_exists dir then ()
else (mkdir_p (Filename.dirname dir); mkdir dir)

If another process creates the directory after the sys_file_exists call but before the mkdir, then mkdir raises.

This has been happening in the wild for Flow (facebook/flow#8027). I suspect it's because of make -j and we have multiple ocamlbuild targets so they run in parallel.

A safer approach could be to suppress the EEXIST: https://github.com/janestreet/core/blob/e8cf8a6d340c4af1e8177591782fb3e326eb0a0b/src/core_unix.ml#L1463-L1478

@gasche
Copy link
Member

gasche commented Aug 31, 2019

Thanks for the report.

A very safe thing to be doing would be to replace ; mkdir dir by ; try_mkdir dir -- check again for the file existence after the recursive parent creation. This is, of course, not removing the race entirely, but it is an easy change to reason about. The actually-correct approach you suggest sounds good, but harder to implement (we would have to cross the abstraction boundaries of ocamlbuild's system-utils abstractions, and I don't understand the portability consequences of such a change).

It's actually easy to reproduce this failure, using a makefile that looks as follows:

DIR=foo
FILE=code.ml

setup:
	mkdir -p $(DIR)
	echo "let x = 1" > $(DIR)/$(FILE)
	echo "run 'make -j all' to reproduce the failure'"

clean:
	ocamlbuild -clean

dist-clean: clean
	rm $(DIR)/$(FILE)
	rmdir $(DIR)

.PHONY: all test1 test2
all: test1 test2

test1:
	$(MAKE) _build/$(DIR)/$(FILE)

test2:
	$(MAKE) _build/$(DIR)/$(FILE)

_build/$(DIR)/$(FILE):
	ocamlbuild -no-log $(DIR)/$(FILE)

I will first test the safe-and-incomplete change; if it fixes the reproduction case, maybe that's a good first step and I can do a minor release. Then I will look at the harder-and-complete fix.

@gasche
Copy link
Member

gasche commented Aug 31, 2019

So:

  • the simple fix does not in fact fix the race
  • a more complex "simple fix" is to just use mkdir -p on Unix systems, which does fix the race by delegating concurrency control to the filesystem (I like that approach); this requires Windows-specific logic (Windows' mkdir does not support the -p flag), for now I kept the legacy implementation on Windows.
  • there is also a concurrency race in the creation of the log file, which I'm trying to look at now

@dbuenzli
Copy link
Collaborator

  • (Windows' mkdir does not support the -p flag), for now I kept the legacy implementation on Windows.

As far as I can tell (but not execute right now), you can simply use mkdir without -p. In Windows it create paths by default, see here.

@gasche
Copy link
Member

gasche commented Aug 31, 2019

Yes, but the most important behavior for the present issue is what happens in case the directory or path already exists (not to fail in that case), which I didn't find documented. I'm also not completely sure of how to know which mkdir command will be run by OCamlbuild, as it goes through bash (I hope it's the Windows utility, and only Cygwin, which has a different os_type, uses some coreutils imitation?).

In any case, I changed the implementation in the PR a bit from the previous message: now what I do is that I just ignore any failure of mkdir in the mkdir_p case, which is fine for two processes racing to create the same directory. When that happens, the user still sees an error message in the console/log, which is not nice, but (1) the implementation change is very simple and (2) maybe users shouldn't run ocamlbuild in parallel like that anyway, and it's fine to tolerate some less-nice behavior in this case.

@gasche
Copy link
Member

gasche commented Aug 31, 2019

I merged #302 (with fixes for the race-condition-related failures I could reproduce) in master.

@mroch, would it be possible to try Flow builds from the master branch, and see if the failures have gone away?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants