Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation failure on centos 7 #761

Closed
hhugo opened this issue Jan 29, 2020 · 32 comments
Closed

Compilation failure on centos 7 #761

hhugo opened this issue Jan 29, 2020 · 32 comments

Comments

@hhugo
Copy link
Member

hhugo commented Jan 29, 2020

#=== ERROR while compiling lwt.5.1.1 ==========================================#
# context     2.0.5 | linux/x86_64 | ocaml-base-compiler.4.09.0 | https://opam.ocaml.org/#203e326c
# path        /usr/local/home/hugo/.opam/4.09.0/.opam-switch/build/lwt.5.1.1
# command     /usr/local/home/hugo/.opam/opam-init/hooks/sandbox.sh build dune build -p lwt -j 4
# exit-code   1
# env-file    /usr/local/home/hugo/.opam/log/lwt-6882-418370.env
# output-file /usr/local/home/hugo/.opam/log/lwt-6882-418370.out
### output ###
# [...]
# lwt_unix.h:101:1: error: unknown type name ‘CRITICAL_SECTION’
#  typedef CRITICAL_SECTION lwt_unix_mutex;
#  ^
#       ocamlc src/unix/lwt_libev_stubs.o (exit 2)
# (cd _build/default/src/unix && /usr/local/home/hugo/.opam/4.09.0/bin/ocamlc.opt -g -I /usr/local/home/hugo/.opam/4.09.0/lib/bytes -I /usr/local/home/hugo/.opam/4.09.0/lib/mmap -I /usr/local/home/hugo/.opam/4.09.0/lib/ocaml/threads -I /usr/local/home/hugo/.opam/4.09.0/lib/ocplib-endian -I /usr/local/home/hugo/.opam/4.09.0/lib/result -I /usr/local/home/hugo/.opam/4.0[...]
# In file included from lwt_libev_stubs.c:9:0:
# lwt_unix.h:100:1: error: unknown type name ‘DWORD’
#  typedef DWORD lwt_unix_thread;
#  ^
# lwt_unix.h:101:1: error: unknown type name ‘CRITICAL_SECTION’
#  typedef CRITICAL_SECTION lwt_unix_mutex;
#  ^

Note that I can install lwt.4.5 without issue

@aantron
Copy link
Collaborator

aantron commented Jan 29, 2020

Probably a duplicate of #760. Is this in a container or other environment I could easily access to reproduce it?

@aantron
Copy link
Collaborator

aantron commented Feb 2, 2020

I was not able to reproduce this in ocaml/opam2:centos-7-opam.

@aantron
Copy link
Collaborator

aantron commented Feb 2, 2020

I did 100 installs inside the container, and wasn't able to observe this.

Is this a sporadic failure, or does it happen consistently for you?

@hhugo
Copy link
Member Author

hhugo commented Feb 3, 2020

The issue happens consistently with the 5.* versions.
conf-libev is not installed

@aantron
Copy link
Collaborator

aantron commented Feb 3, 2020

I can't seem to eyeball it, nor reproduce it (if you can, please describe your setup or environment). I was also installing without conf-libev.

Here are some links.

The immediate code:

lwt/src/unix/lwt_unix.h

Lines 90 to 104 in 7be56db

#if defined(HAVE_PTHREAD)
#include <pthread.h>
typedef pthread_t lwt_unix_thread;
typedef pthread_mutex_t lwt_unix_mutex;
typedef pthread_cond_t lwt_unix_condition;
#else
typedef DWORD lwt_unix_thread;
typedef CRITICAL_SECTION lwt_unix_mutex;
typedef struct lwt_unix_condition lwt_unix_condition;
#endif

Changes between 4.5.0 and 5.0.0 (nothing looks directly related to me): f38996e#diff-e278928a82c89ddd5ca43ac7ec7e08c8.

The commit in that range that seems most relevant to me, though still likely a red herring: 047e01a#diff-1573f3888bdb422f2109b0c08ffff260

Recent changes to discover.ml, though none EDIT: one of them happened in 4.5.0...5.0.0: https://github.com/ocsigen/lwt/commits/master/src/unix/config/discover.ml

5.* versions

Have you tested specifically with 5.0.0, which is what I am taking the mention of 5.* to imply?

@hhugo
Copy link
Member Author

hhugo commented Feb 3, 2020

The issue only happens when installing lwt with opam. It seems to be related to sandboxing.

This is the command ran by opam:

/usr/home/hugo/.opam/opam-init/hooks/sandbox.sh build dune build -p lwt -j 4

@hhugo
Copy link
Member Author

hhugo commented Feb 3, 2020

I've tested all versions with major version 5.

@hhugo
Copy link
Member Author

hhugo commented Feb 3, 2020

I think I've figured out the issue. In my setup, the opam sandbox make opam (the binary) unavailable.
Setting the environment variable OPAM_USER_PATH_RO so that opam is reachable again fixes the issue. I don't understand why this is not an issue with versions < "5.*"

@aantron
Copy link
Collaborator

aantron commented Feb 3, 2020

It must be caused by this change: 17e04a9#diff-e278928a82c89ddd5ca43ac7ec7e08c8R401-R402. Thinking about the best course of action...

@aantron
Copy link
Collaborator

aantron commented Feb 3, 2020

However, I still can't eyeball why that would affect pthread detection, so perhaps that is not the cause. But I would suggest to start there debugging the discover script.

@dinosaure
Copy link
Member

Currently, I reproduced it with a Docker container ubuntu-18.06 and opam-2.0.0.

@aantron
Copy link
Collaborator

aantron commented Feb 12, 2020

I just cannot seem to be able to reproduce this. I created a fresh VM and installed Docker in it, then:

$ docker pull ocaml/opam2:ubuntu-18.04
$ docker run -it ocaml/opam2:ubuntu-18.04
> opam init
> opam --version
2.0.6
> ocaml -version
4.09.0
> sudo apt install m4
> which bwrap
/usr/bin/bwrap
> echo $OPAM_USER_PATH_RO

> opam install --deps-only lwt.5.1.1
> opam install lwt.5.1.1

This works. Can you try this? Do you see a difference between what I am doing and your commands? Is there something I should show or try?

@dinosaure
Copy link
Member

Ok, I tried to reproduce the result with a truly sandboxed OPAM - ocaml/opam2:ubuntu-18.04 does not allow (at the beginning) to use bwrap (even if it is installed).

Currently:

$ docker run -it --rm --privileged --cap-add=CAP_SYS_ADMIN ocaml/opam2:ubuntu-18.04
# rm -r ~/.opam
# rm ~/.opamrc
# sudo sysctl kernel.unprivileged_userns_clone=1
# opam init
# opam install lwt

I did the same things on centos-7 and did not get any bugs. However, it depends on how you installed OPAM. In my case (when I got the bug), I installed OPAM by hands with a bootstrap with opam-devel - and it seems that, by this way, sandbox.sh is not properly updated.

@aantron
Copy link
Collaborator

aantron commented Feb 13, 2020

@dinosaure, it's not clear to me from your post, do those commands trigger the Lwt install failure, or result in no failure? (edit: or something else?)

@dinosaure
Copy link
Member

Results in no failure, but if the problem is related to bubblewrap, this is the way to really use it.

@olafhering
Copy link
Contributor

The code in test.c is about pthread, but the actual compiler commandline includes a trailing '-lev'.

--- a/src/unix/config/discover.ml
+++ b/src/unix/config/discover.ml
@@ -446,7 +446,7 @@ struct
           C_library_flags.add_link_flags ["-lev"];
           Some true
         | _ ->
-          C_library_flags.detect context ~library:"ev";
+          C_library_flags.detect context ~library:"dl";
           compiles context code
       end
   }

@aantron
Copy link
Collaborator

aantron commented Feb 22, 2020

AFAICT the entire surrounding code, including the C code, is about libev:

let code = {|
#include <ev.h>
int main()
{
ev_default_loop(0);
return 0;
}
|}
in
match compiles context code ~link_flags:["-lev"] with
| Some true ->
C_library_flags.add_link_flags ["-lev"];
Some true
| _ ->
C_library_flags.detect context ~library:"ev";
compiles context code

What test.c are you referring to?

@olafhering
Copy link
Contributor

test.c comes probably from Configurator. The ~library argument from the libev test is applied to each further test. I think the Not_found case triggers, then extend [] ["-l" ^ library] is called anyway.

--- a/src/unix/config/discover.ml
+++ b/src/unix/config/discover.ml
@@ -255,7 +255,7 @@ struct
             ["-I" ^ (path // "include")]
             ["-L" ^ (path // "lib"); "-l" ^ library]
         with Not_found ->
-          extend [] ["-l" ^ library]
+          extend [] [] (*["-l" ^ library]*)
 
   let ws2_32_lib context =
     if Configurator.ocaml_config_var_exn context "os_type" = "Win32" then

@aantron
Copy link
Collaborator

aantron commented Feb 22, 2020

The arguments are indeed accumulated and applied to further tests at the moment. However, I am unable to follow the rest of the description. Which Not_found case? What is the specific sequence of events that you are suggesting occurs, how does it lead to this bug, or to another bug you may be observing? To be fair, perhaps not every bug report or comment needs such detail, but I wasn't able to infer these things.

@aantron
Copy link
Collaborator

aantron commented Feb 22, 2020

Also, are you observing this specific issue? If so, could you describe your environment or what steps you take to reproduce it, as I haven't been able to do so myself?

@olafhering
Copy link
Contributor

This is with dune 1.11.4 and ocaml-4.05.

According to my strace of "dune build --no-buffer --verbose --profile release @install" the option -lev is appended to every call of gcc. How are the tests after libev are supposed to succeed if the non-existant libev.so is supposed to be linked to each of them? The small change above will not add an non-existant lib to the global link_flags.

@olafhering
Copy link
Contributor

Is there a libev.so on your system, which would mask the bug?

@aantron
Copy link
Collaborator

aantron commented Feb 22, 2020

No, the reason this bug is masked (I haven't yet confirmed it, but I am starting to see what you mean), is that almost everyone is using opam, and we have conf-libev not installed if there is no libev.so, which results in the libev test not being run in the first place. However, this seems like a separate (potential) problem from the one in this issue. If real, I will rewrite discover.ml not to share state between tests like it currently does, because it is getting too difficult to understand as the tests become more and more elaborate over time.

@aantron
Copy link
Collaborator

aantron commented Feb 22, 2020

...or perhaps it the same. I would really appreciate a direct answer to whether you are observing the problem in this specific issue, or else a paste of your output, as it helps me to reason about the other messages.

Also, are you observing this specific issue?

@olafhering
Copy link
Contributor

Yes, also for me compilation fails because pthread is not detected, which leads to DWORD not found.

@aantron
Copy link
Collaborator

aantron commented Feb 22, 2020

Thanks :) I'm about to try your command in a container, and I've started to see what the bug likely is. Hopefully, I'll soon have reproduced it, and, if so, I'll probably fix it with the rewrite/refactor over the weekend or at the start of next week.

@aantron
Copy link
Collaborator

aantron commented Feb 22, 2020

I wasn't able to reproduce this. I did:

docker pull ocaml/opam2
docker run -it ocaml/opam2
# continuing in the container
sudo apt install m4
cd ..
git clone https://github.com/ocsigen/lwt.git
opam install dune.1.11.4
opam install --deps-only .
dune build --no-buffer --verbose --profile release @install

The build succeeded. I saw in the output that lwt.unix was, indeed, built. The container had OCaml 4.09.0, but I highly doubt the difference between 4.09.0 and 4.05.0 is significant for this issue.

I checked using find . -name '*libev*' that there is no libev in either /lib or /usr/lib, and also that opam install conf-libev, which searches for libev, failed. Also, _build/default/src/unix/lwt_features.ml has let _HAVE_LIBEV = false (and let _HAVE_PTHREAD = true).

I have enough from your comments to look into how else to trigger this for now, I think. And discover.ml needs the refactoring in any case, even if only to avoid confusing us while looking into this issue.

In the meantime, can you tell me more about your environment? I understand that you are installing dependencies manually and not through opam. Is there a reasonable way for me to access the state before the dune build command?

@aantron
Copy link
Collaborator

aantron commented Feb 22, 2020

Ah, right after writing that, it occurred to me to delete the opam binary right before the dune build, and now I can reproduce it. Thanks again!

aantron pushed a commit that referenced this issue Feb 23, 2020
@aantron
Copy link
Collaborator

aantron commented Feb 23, 2020

@olafhering I applied your patch (with slight edit) from #761 (comment), and it seems to have fixed the issue. Once I started looking into refactoring discover.ml, I realized that the same patch is all that will be needed for now, anyway. So, thanks yet again :)

I assume you've already confirmed that this fixes it for you. If not, please try the discover-flags branch. @hhugo, @dinosaure, if you have time, could you please also check if the discover-flags branch fixes the problem for you?

I will merge this shortly and do a release.

As for why the broken call to extend was in there, IIRC that's a holdover from an earlier version of discover.ml that checked flag sources (just trying the flag, pkg_config, filesystem search, etc.) in a different order from now. That extend call indeed doesn't make sense anymore.

That it was broken was masked by several factors, but other recent changes to discover.ml unmasked it.

@aantron
Copy link
Collaborator

aantron commented Feb 23, 2020

There will still be the problem (not yet reported) that if opam is not runnable, but it is still an opam build, discover.ml can't determine whether conf-libev is installed. So, I'll make another change, so that discover.ml doesn't have to run opam to do this. And, I'll change the preprocessor directives so that the error from this issue explains that libpthread wasn't found, rather than just looking like the OS was detected as Windows.

@olafhering
Copy link
Contributor

This change works for me. Thanks!

aantron added a commit that referenced this issue Feb 23, 2020
Instead, pass information about the presence of conf-libev down from
opam on the discover.ml command line.

See #761 (comment).
@aantron
Copy link
Collaborator

aantron commented Mar 4, 2020

The fix was released in Lwt 5.1.2 about a week ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants