Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lockf issue on Windows #4944

Closed
oandrieu opened this issue Dec 2, 2021 · 6 comments · Fixed by #4948
Closed

lockf issue on Windows #4944

oandrieu opened this issue Dec 2, 2021 · 6 comments · Fixed by #4948

Comments

@oandrieu
Copy link
Contributor

oandrieu commented Dec 2, 2021

I have a recurring issue with opam on Windows (binary is from fdopen's OCaml for Windows): a lock operation fails with a permission denied error. This happens when there is contention on the lock, for instance two CI agents starting concurrently and launching an opam install.

I can reproduce with a command like this in the shell (installing an already installed package) :
opam install dune & opam install dune ; wait
According to the logs, this seems to be because of the write lock on a switch lock file:

00:00.022  GSTATE                 LOAD-GLOBAL-STATE @ C:/ANSYSDev/opam
00:00.022  SYSTEM                 LOCK C:/ANSYSDev/opam/lock (none => read)
00:00.023  FILE(config)           Read C:/ANSYSDev/opam/config in 0.000s
00:00.023  RSTATE                 LOAD-REPOSITORY-STATE @ C:/ANSYSDev/opam
00:00.026  FILE(repos-config)     Read C:/ANSYSDev/opam/repo/repos-config in 0.002s
00:00.026  SYSTEM                 LOCK C:/ANSYSDev/opam/repo/state.cache (none => read)
00:00.112  RSTATE                 Loaded C:/ANSYSDev/opam/repo/state.cache in 0.086s
00:00.315  SYSTEM                 LOCK C:/ANSYSDev/opam/repo/state.cache (read => none)
00:00.315  RSTATE                 Cache found
00:00.315  STATE                  LOAD-SWITCH-STATE @ 4.08.1+msvc64
00:00.316  SYSTEM                 LOCK C:/ANSYSDev/opam/4.08.1+msvc64/.opam-switch/lock (none => write)
00:00.316  SYSTEM                 LOCK C:/ANSYSDev/opam/repo/lock (none => none)
00:00.316  SYSTEM                 LOCK C:/ANSYSDev/opam/config.lock (none => none)
Fatal error:
C:\cygwin64\usr\local\bin\opam.exe: "lockf" failed: Permission denied

but on a successful instance:

00:00.021  GSTATE                 LOAD-GLOBAL-STATE @ C:/ANSYSDev/opam
00:00.022  SYSTEM                 LOCK C:/ANSYSDev/opam/lock (none => read)
00:00.023  FILE(config)           Read C:/ANSYSDev/opam/config in 0.001s
00:00.023  RSTATE                 LOAD-REPOSITORY-STATE @ C:/ANSYSDev/opam
00:00.026  FILE(repos-config)     Read C:/ANSYSDev/opam/repo/repos-config in 0.003s
00:00.026  SYSTEM                 LOCK C:/ANSYSDev/opam/repo/state.cache (none => read)
00:00.133  RSTATE                 Loaded C:/ANSYSDev/opam/repo/state.cache in 0.106s
00:00.347  SYSTEM                 LOCK C:/ANSYSDev/opam/repo/state.cache (read => none)
00:00.347  RSTATE                 Cache found
00:00.347  STATE                  LOAD-SWITCH-STATE @ 4.08.1+msvc64
00:00.348  SYSTEM                 LOCK C:/ANSYSDev/opam/4.08.1+msvc64/.opam-switch/lock (none => write)
00:00.349  FILE(switch-config)    Read C:/ANSYSDev/opam/4.08.1+msvc64/.opam-switch/switch-config in 0.001s
00:00.350  FILE(switch-state)     Read C:/ANSYSDev/opam/4.08.1+msvc64/.opam-switch/switch-state in 0.001s
…

Any idea on what's causing this and if there's a way to remedy this ?

@oandrieu
Copy link
Contributor Author

oandrieu commented Dec 2, 2021

the config report:

$ opam config report
# opam config report
# opam-version      2.0.8
# self-upgrade      no
# system            arch=x86_64 os=win32 os-distribution=cygwinports os-version=10.0.18362
# solver            builtin-mccs+glpk
# install-criteria  -removed,-count[version-lag,request],-count[version-lag,changed],-changed
# upgrade-criteria  -removed,-count[version-lag,solution],-new
# jobs              5
# repositories      1 (version-controlled)
# pinned            0
# current-switch    4.08.1+msvc64

@kit-ty-kate
Copy link
Member

cc @MisterDA just in case if you know

@MisterDA
Copy link
Contributor

MisterDA commented Dec 3, 2021

I must have had this issue some time ago; and so I trained myself never to use opam concurrently...

@dra27
Copy link
Member

dra27 commented Dec 4, 2021

I fear that I have the same experience as @MisterDA - it happened at some point in the past and I filed it away!

I'm fairly sure, having just poked around a little, that the fix is obvious - opam's use of Unix.lockf with the test commands assumes that it will get EAGAIN if the lock is already taken (which is the Linux behaviour) but on Windows you get EACCES which, as it happens, is still POSIX-compliant.

I'll verify that that's actually the problem and open a PR.

@oandrieu
Copy link
Contributor Author

oandrieu commented Dec 6, 2021

Oh, right, I thought opam was only taking regular (blocking) locks, and I could not reproduce with a simple program. But indeed, that's the problem: the error code returned when a non-blocking lock fails.

The fix is as simple as:

diff --git a/src/core/opamSystem.ml b/src/core/opamSystem.ml
index df2c4dd3..9ba7980a 100644
--- a/src/core/opamSystem.ml
+++ b/src/core/opamSystem.ml
@@ -1077,7 +1077,7 @@ let rec flock_update
          if Sys.win32 && kind <> `Lock_none then
            Unix.(lockf fd F_ULOCK 0);
          Unix.lockf fd (unix_lock_op ~dontblock:true flag) 0
-       with Unix.Unix_error (Unix.EAGAIN,_,_) ->
+       with Unix.Unix_error ((Unix.EAGAIN | Unix.EACCES),_,_) ->
          if dontblock then
            OpamConsole.error_and_exit `Locked
              "Another process has locked %s and non blocking mode enabled"

@rjbou
Copy link
Collaborator

rjbou commented Dec 7, 2021

It's working! thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants