Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ubuntu] [3.12.0.0-prerelease] Semaphore does not exist #9993

Open
jasagredo opened this issue May 9, 2024 · 16 comments
Open

[Ubuntu] [3.12.0.0-prerelease] Semaphore does not exist #9993

jasagredo opened this issue May 9, 2024 · 16 comments

Comments

@jasagredo
Copy link
Collaborator

Describe the bug
End of log says it all:

Benchmark 2: cabal build -O1 --semaphore all
Resolving dependencies...
Build profile: -w ghc-9.8.2 -O1
In order, the following will be built (use -v for more details):
...
 - ouroboros-consensus-cardano-0.15.0.0 (test:cardano-test) (first run)
Created semaphore called cabal_semaphore_1a with 24 slots.
Configuring test suite 'doctest' for ouroboros-consensus-0.17.0.0...
Configuring library for strict-sop-core-0.1.0.0...
Preprocessing test suite 'doctest' for ouroboros-consensus-0.17.0.0...
Building test suite 'doctest' for ouroboros-consensus-0.17.0.0...
cabal_semaphore_1a: semOpen: does not exist (No such file or directory)
Preprocessing library for strict-sop-core-0.1.0.0...
Building library for strict-sop-core-0.1.0.0...
cabal_semaphore_1a: semOpen: does not exist (No such file or directory)
Error: [Cabal-7125]
Failed to build test:doctest from ouroboros-consensus-0.17.0.0.
Failed to build strict-sop-core-0.1.0.0 (which is required by test:mock-test from ouroboros-consensus-diffusion-0.15.0.0, test:infra-test from ouroboros-consensus-diffusion-0.15.0.0 and others).

To Reproduce

I have reproduced it two times with the following invocation:

hyperfine --prepare "cabal clean" "cabal build -O1 -j all" "cabal build -O1 --semaphore all" [--show-output]

On IntersectMBO/ouroboros-consensus@98dbae0

System information

  • Ubuntu 22.04 under WSL, Windows 11
❯ cabal --version
cabal-install version 3.11.0.0
compiled using version 3.12.0.0 of the Cabal library

❯ ghc --version
The Glorious Glasgow Haskell Compilation System, version 9.8.2

Despite that message, it is the one installed by the command on the 3.12.0.0 release:

❯ ghcup list
...
✔✔ cabal 3.12.0.0-prerelease                           stray
@jasagredo
Copy link
Collaborator Author

I changed branches, did some changes, and the error seems to persist, regardless of whether I cabal clean or not:

❯ cabal build all --semaphore
...
 - ouroboros-consensus-cardano-0.15.0.0 (test:cardano-test) (first run)
Created semaphore called cabal_semaphore_10 with 24 slots.
Preprocessing test suite 'doctest' for ouroboros-consensus-0.17.0.0...
Building test suite 'doctest' for ouroboros-consensus-0.17.0.0...
cabal_semaphore_10: semOpen: does not exist (No such file or directory)
Error: [Cabal-7125]
Failed to build test:doctest from ouroboros-consensus-0.17.0.0.

@mpickering
Copy link
Collaborator

It seems the project doesn't built with 9.8.2 so I can't reproduce as-per the instructions:

[nix-shell:~/ouroboros-consensus]$  cabal build -O1 --semaphore all
Resolving dependencies...
Error: [Cabal-7107]
Could not resolve dependencies:
[__0] trying: ouroboros-consensus-0.17.0.0 (user goal)
[__1] trying: vector-0.13.1.0 (dependency of ouroboros-consensus)
[__2] next goal: cardano-crypto-class (dependency of ouroboros-consensus)
[__2] rejecting: cardano-crypto-class-2.1.4.0 (conflict: pkg-config package libblst-any, not found in the pkg-config database)
[__2] rejecting: cardano-crypto-class-2.1.3.0 (conflict: vector==0.13.1.0, cardano-crypto-class => vector<0.13)
[__2] skipping: cardano-crypto-class; 2.1.2.0, 2.1.1.0, 2.1.0.2, 2.1.0.1, 2.1.0.0 (has the same characteristics that caused the previous version to fail: excludes 'vector' version 0.13.1.0)
[__2] trying: cardano-crypto-class-2.0.0.1
[__3] next goal: base (dependency of ouroboros-consensus)
[__3] rejecting: base-4.19.1.0/installed-862d (conflict: cardano-crypto-class => base>=4.14 && <4.17)
[__3] skipping: base; 4.19.1.0, 4.19.0.0, 4.18.2.0, 4.18.1.0, 4.18.0.0, 4.17.2.1, 4.17.2.0, 4.17.1.0, 4.17.0.0 (has the same characteristics that caused the previous version to fail: excluded by constraint '>=4.14 && <4.17' from 'cardano-crypto-class')
[__3] rejecting: base; 4.16.4.0, 4.16.3.0, 4.16.2.0, 4.16.1.0, 4.16.0.0, 4.15.1.0, 4.15.0.0, 4.14.3.0, 4.14.2.0, 4.14.1.0, 4.14.0.0, 4.13.0.0, 4.12.0.0, 4.11.1.0, 4.11.0.0, 4.10.1.0, 4.10.0.0, 4.9.1.0, 4.9.0.0, 4.8.2.0, 4.8.1.0, 4.8.0.0, 4.7.0.2, 4.7.0.1, 4.7.0.0, 4.6.0.1, 4.6.0.0, 4.5.1.0, 4.5.0.0, 4.4.1.0, 4.4.0.0, 4.3.1.0, 4.3.0.0, 4.2.0.2, 4.2.0.1, 4.2.0.0, 4.1.0.0, 4.0.0.0, 3.0.3.2, 3.0.3.1 (constraint from non-reinstallable package requires installed instance)
[__3] fail (backjumping, conflict set: base, cardano-crypto-class, ouroboros-consensus)
After searching the rest of the dependency tree exhaustively, these were the goals I've had most trouble fulfilling: base, ouroboros-consensus, vector, cardano-crypto-class
Try running with --minimize-conflict-set to improve the error message.

@mpickering
Copy link
Collaborator

I can't reproduce this on NixOS.

To reproduce I entered the shell.nix using nix-shell shell.nix after setting up the binary cache.

Then cabal build -O1 -w $(which ghc-9.8.2) --semaphore all, and observed that the call to ghc was passed the semaphore.

Perhaps something to do with WSL?

@mpickering
Copy link
Collaborator

Results from the repo that @jasagredo tried the flag on:

Benchmark 1: cabal build -w /nix/store/kyh03hm4ni2d5rhydlsgcv88c1vdlylg-ghc-9.8.2/bin/ghc-9.8.2 -O1 -j all
  Time (mean ± σ):     186.922 s ±  5.796 s    [User: 468.499 s, System: 54.065 s]
  Range (min … max):   180.807 s … 197.243 s    10 runs
 
Benchmark 2: cabal build -O1 -w /nix/store/kyh03hm4ni2d5rhydlsgcv88c1vdlylg-ghc-9.8.2/bin/ghc-9.8.2 --semaphore all
  Time (mean ± σ):     161.768 s ±  7.623 s    [User: 564.039 s, System: 62.299 s]
  Range (min … max):   151.206 s … 173.469 s    10 runs

@jasagredo
Copy link
Collaborator Author

Confirming whether this is a WSL thing would be nice. I might try on a full Ubuntu I have around.

@jasagredo
Copy link
Collaborator Author

I just checked in my full Ubuntu, and I get the same error, cabal_semaphore_25: semOpen: does not exist (No such file or directory). See this pastebin (the output is a bit long) https://pastebin.com/y0XPsAMT

@mpickering
Copy link
Collaborator

@jasagredo Does this happen when building other projects as well? If I can reproduce in a docker container that would be good.

@jasagredo
Copy link
Collaborator Author

I just checked now with ouroboros-network which I had around and it also fails: https://pastebin.com/uruqaihZ

@cloudyluna
Copy link

Hello. I'm also hitting this error on my system when trying to build a simple project generated with cabal init and relude as an extra dependency.

System information

  • OS: GNU/Linux OpenSUSE 15.5

  • Arch: x86_64

  • WSL or native: native

  • Haskell tools installed and used with: ghcup

  • ghcup version:

    $ ghcup --version
    The GHCup Haskell installer, version 0.1.22.0
    
  • ghc version:

    $ ghc --version
    The Glorious Glasgow Haskell Compilation System, version 9.8.2
    
  • cabal version:

    $ cabal --version
    cabal-install version 3.11.0.0
    compiled using version 3.12.0.0 of the Cabal library
    
  • libc version:

    $ ldd --version
    ldd (GNU libc) 2.31
    Copyright (C) 2020 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    Written by Roland McGrath and Ulrich Drepper.
    
  • uname:

    $ uname -a
    Linux suse 5.14.21-150500.55.52-default #1 SMP PREEMPT_DYNAMIC Tue Mar 5 16:53:41 UTC 2024 (a62851f) x86_64 x86_64 x86_64 GNU/Linux
    

To reproduce

Run these commands in order.

  1. git clone https://github.com/cloudyluna/simple-semaphore-build
  2. cd simple-semaphore-build
  3. cabal build all --semaphore -v

Build log (failed with error)

https://gist.github.com/cloudyluna/b13e7d7569ffa3ad1804ba7ec76813b7

@mpickering
Copy link
Collaborator

@wz1000 investigated this issue and the problem appears to be that the binaries are built and linked against musl but used on a system which uses glibc.

  • cabal-install is built against musl so creating a semaphore creates /dev/shm/cabal_semaphore_k
  • ghc is built against glibc, and attempts to read the semaphore from /dev/shm/sem.cabal_semaphore_k

musl and glibc don't agree on what the names of the semaphores are and therefore we get this error.

hasufell added a commit to haskell/ghcup-metadata that referenced this issue Jul 8, 2024
@hasufell
Copy link
Member

hasufell commented Jul 8, 2024

@wz1000 investigated this issue and the problem appears to be that the binaries are built and linked against musl but used on a system which uses glibc.

  • cabal-install is built against musl so creating a semaphore creates /dev/shm/cabal_semaphore_k
  • ghc is built against glibc, and attempts to read the semaphore from /dev/shm/sem.cabal_semaphore_k

musl and glibc don't agree on what the names of the semaphores are and therefore we get this error.

This seems like a problematic coupling between GHC and cabal. How can this be fixed?

@mpickering
Copy link
Collaborator

If GHC had a command line mode to create the semaphore and managed them itself then this coupling wouldn't exist. I can't think of any other options immediately.

@Bodigrim
Copy link
Collaborator

Bodigrim commented Jul 8, 2024

Could Cabal degrade gracefully and switch to no-semaphore mode automatically?

@geekosaur
Copy link
Collaborator

If GHC had a command line mode to create the semaphore and managed them itself then this coupling wouldn't exist. I can't think of any other options immediately.

This is how it should have worked to begin with. Anything other than either being explicit about the semaphore to other clients or hiding it completely is deep into "implementation-defined behavior"-land.

@wz1000
Copy link

wz1000 commented Jul 9, 2024

If GHC had a command line mode to create the semaphore and managed them itself then this coupling wouldn't exist. I can't think of any other options immediately.

It seems like cabal also needs to open the semaphore and wait on it so that there are enough resources to spawn a new ghc process, so I don't see how exactly this would work. Are you suggesting more ghc commands to wait on the semaphore as well?

One alternative is to vendor the sem_open etc. implementation from either musl or glibc into semaphore-compat, but these seem complicated enough to make this unattractive: https://git.musl-libc.org/cgit/musl/tree/src/thread/sem_open.c

Another is to detect in both ghc and cabal-install if the current executable is compiled with musl and if so prepend sem. to the name of the semaphore passed to sem_open.

@wz1000
Copy link

wz1000 commented Jul 17, 2024

@hasufell @geekosaur I give more info on the problem and outline some solutions in https://gitlab.haskell.org/ghc/ghc/-/issues/25087. It would be really nice if you could give some feedback on which solution would be preferable to the cabal maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants