Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coq build is broken in OSX. #11316

Closed
ejgallego opened this issue Jan 28, 2018 · 17 comments
Closed

coq build is broken in OSX. #11316

ejgallego opened this issue Jan 28, 2018 · 17 comments

Comments

@ejgallego
Copy link
Contributor

@ejgallego ejgallego commented Jan 28, 2018

It seems that bedf285 broke the build of Coq in OSX.

See for example: https://travis-ci.org/ocaml/opam-repository/jobs/334441912

This will likely require action in backporting upstream coq/coq#6518

An extremely annoying aspect of OPAM is these "updates" to already existing packages without increasing the version numbering. This indeed looks to me like a critical design flaw. Now, it turns out that some users will have num.1.1 in their system however there is no way to know if the have a 1.1 after or before bedf285 was committed.

So it seems to me that it is impossible to debug problems like these reliably.

@dbuenzli
Copy link
Collaborator

@dbuenzli dbuenzli commented Jan 28, 2018

An extremely annoying aspect of OPAM is these "updates" to already existing packages without increasing the version numbering.

This is discussed here #10531

@ejgallego
Copy link
Contributor Author

@ejgallego ejgallego commented Jan 28, 2018

@RalfJung
Copy link
Contributor

@RalfJung RalfJung commented Jan 29, 2018

Now, it turns out that some users will have num.1.1 in their system however there is no way to know if the have a 1.1 after or before bedf285 was committed.

Is there any way the num maintainer could make a 1.1-1 release or so to indicate this, and revert 1.1 back to the old state? Even if this can (and probably should?) be fixed on the Coq side; I feel like a change breaking reverse dependencies like this shouldn't happen without some kind of version bump.

@ejgallego
Copy link
Contributor Author

@ejgallego ejgallego commented Jan 29, 2018

@RalfJung the situation has been like this in OPAM for a long time, I guess we can open a separate bug requesting that. On the other hand, we should try to fix Coq's package, unfortunately I understand none of us both are OSX users so testing is gonna be hard. Maybe @herbelin could help here?

@herbelin
Copy link
Contributor

@herbelin herbelin commented Jan 29, 2018

@ejgallego: If I summarize my understanding of the issue, there was a change in how num is installed with ocaml 4.06.0 which makes that 8.7.1 is not able any more to find nums.cmxa in the path, is that correct? What is specific to OS X then? E.g., why doesn't it happen under Linux as well?

Incidentally, would forcing the coq opam packages to use version 1.0 of num help?

@ejgallego
Copy link
Contributor Author

@ejgallego ejgallego commented Jan 29, 2018

@ejgallego: If I summarize my understanding of the issue, there was a change in how num is installed with ocaml 4.06.0 which makes that 8.7.1 is not able any more to find nums.cmxa in the path, is that correct? What is specific to OS X then? E.g., why doesn't it happen under Linux as well?

Indeed it seems specific to OSX. I am not sure why it only happens there; indeed it would be good if we could figure out. The problem was caused by (an unversioned) change to the num package so it is indeed very hard to debug/track now.

Incidentally, would forcing the coq opam packages to use version 1.0 of num help?

IMHO this dependency would prove too restrictive so I advise against it. It seems the problem is indeed in Coq so the easiest fix for us would be to backport the ocamlfind patch scheduled for 8.7.2.

@herbelin
Copy link
Contributor

@herbelin herbelin commented Jan 29, 2018

It seems the problem is indeed in Coq so the easiest fix for us would be to backport the ocamlfind patch scheduled for 8.7.2.

Looks reasonable.

IMHO this dependency would prove too restrictive so I advise against it.

That could however be a tradeoff for 8.7.1 to work on OS X?

@ejgallego
Copy link
Contributor Author

@ejgallego ejgallego commented Jan 29, 2018

IMHO this dependency would prove too restrictive so I advise against it.

That could however be a tradeoff for 8.7.1 to work on OS X?

IMHO it would make sense for num to be reverted to a sane version and then the change implemented properly.

@dra27
Copy link
Contributor

@dra27 dra27 commented Jan 29, 2018

Sorry that the change has caused a problem, but the issue was not with the num release, but with the way num was packaged in opam (which, while still something that could be worked on, is why the version was not bumped - we don't at present have package versions in opam-repository to release, say, num.1.1-2).

The issue was that opam-repository was allowing the num package to install things to /usr/local which is fundamentally incorrect (and causes other things to break further down the line). The fix in upstream is to provide an option to do a pure findlib install. This was backported to fix the installation of num to system switches.

This issue affects mainly OSX because /usr/local is often user-writeable, whereas on Linux it usually isn't (and the use of system switches is less common because they're typically too old).

The workaround for Coq should be to require a non-system switch (or to detect Num via ocamlfind, but perhaps that's less practical?).

@ejgallego
Copy link
Contributor Author

@ejgallego ejgallego commented Jan 29, 2018

Thanks @dra27 for the explanation, I'm not sure this still explains the discrepancy, as far as I can tell num is supposed to be installed in a compatible way even under a prefix [that is to say, under lib/ocaml in addition of under lib/num. That's what does on Linux and it seems to avoid the problem.

@Zimmi48
Copy link
Contributor

@Zimmi48 Zimmi48 commented Jan 29, 2018

The workaround for Coq should be to require a non-system switch (or to detect Num via ocamlfind, but perhaps that's less practical?).

The suggested solution sound overly restrictive. And as @ejgallego said, the bug fix has already been merged and is scheduled for 8.7.2 (coq/coq@fc76dd1). Maybe indeed, this patch could be shipped with the OPAM package for Coq 8.7.1.

@ejgallego
Copy link
Contributor Author

@ejgallego ejgallego commented Jan 29, 2018

Well, maybe Linux builds work by chance as they are picking out a different num package? I'm afraid we cannot know.

@ejgallego
Copy link
Contributor Author

@ejgallego ejgallego commented Jan 29, 2018

The suggested solution sound overly restrictive. And as @ejgallego said, the bug fix has already been merged and is scheduled for 8.7.2 (coq/coq@fc76dd1). Maybe indeed, this patch could be shipped with the OPAM package for Coq 8.7.1.

This seems the best choice, the current opam package has indeed support for patches.

@dra27
Copy link
Contributor

@dra27 dra27 commented Jan 29, 2018

The Linux builds are very unlikely to be using system switches - when installed to a non-system switch, the num package installs files to OCaml's lib directory, as always

@dra27
Copy link
Contributor

@dra27 dra27 commented Jan 29, 2018

Just to expand on that, having looked at the log. There's no discrepancy between Linux/OSX here - all the other Travis Linux jobs build the compiler, and so the num.1.1 installs compatibly, as you say. On OSX, OCaml compiler is installed using Homebrew (since it has 4.06.0), and so it does not - because opam packages should never write outside $OPAMROOT.

#11245 adds testing for system switches on Linux as well - see job 17673.1 in https://travis-ci.org/ocaml/opam-repository/builds/330355502 and I can guarantee that this Coq package would have failed in just the same way in that job as it does on the OSX job you've linked to.

In this particular instance, package versioning wouldn't have helped, although it might have made what was changing clearer - if we assume that num.1.1-1 was the original and my revised version was num.1.1-2, then we would have deleted or disabled num.1.1-1 because it was doing something very broken (and which macOS users who need the num package are going to find tedious to fix next time they update ocamlfind...)

@avsm
Copy link
Member

@avsm avsm commented Jan 29, 2018

The reason that OSX/Homebrew specifically breaks here is because there is a system and user-writeable /usr/local, which means that a packaging error in opam propagates and contaminates the system switch.

@ejgallego
Copy link
Contributor Author

@ejgallego ejgallego commented Jan 29, 2018

Ah! Indeed that explains it, thanks a lot @dra27 !

I guess the versioning complain was more about not being able to rollback the package for our own testing than about building a correct package.

ejgallego added a commit to ejgallego/opam-repository that referenced this issue Jan 30, 2018
Backports coq/coq@fc76dd1 to 8.7.1 to fix ocaml#11316.
ejgallego added a commit to ejgallego/coq that referenced this issue Jan 30, 2018
It seems that recent fixes using ocamlbuild to locate num missed to
switch the csdpcert binary. This is needed to solve problems like the
one in this issue: ocaml/opam-repository#11316
ejgallego added a commit to ejgallego/coq that referenced this issue Jan 30, 2018
It seems that recent fixes using ocamlbuild to locate num missed to
switch the csdpcert binary. This is needed to solve problems like the
one in this issue: ocaml/opam-repository#11316
ejgallego added a commit to ejgallego/coq that referenced this issue Jan 30, 2018
It seems that recent fixes using ocamlbuild to locate num missed to
switch the csdpcert binary. This is needed to solve problems like the
one in this issue: ocaml/opam-repository#11316
ejgallego added a commit to ejgallego/opam-repository that referenced this issue Jan 30, 2018
Backports coq/coq@fc76dd1 to 8.7.1 to fix ocaml#11316.
ejgallego added a commit to ejgallego/coq that referenced this issue Jan 30, 2018
It seems that recent fixes using ocamlbuild to locate num missed to
switch the csdpcert binary. This is needed to solve problems like the
one in this issue: ocaml/opam-repository#11316
ejgallego added a commit to ejgallego/opam-repository that referenced this issue Jan 30, 2018
Backports coq/coq@fc76dd1 to 8.7.1 to fix ocaml#11316.
ejgallego added a commit to ejgallego/coq that referenced this issue Jan 30, 2018
It seems that recent fixes using ocamlbuild to locate num missed to
switch in some binaries. This is needed to solve problems like the
one in this issue: ocaml/opam-repository#11316
ejgallego added a commit to ejgallego/coq that referenced this issue Jan 30, 2018
It seems that recent fixes using ocamlbuild to locate num missed to
switch in some binaries. This is needed to solve problems like the
one in this issue: ocaml/opam-repository#11316
ejgallego added a commit to ejgallego/opam-repository that referenced this issue Jan 30, 2018
Backports coq/coq@fc76dd1 to 8.7.1 to fix ocaml#11316.
ejgallego added a commit to ejgallego/opam-repository that referenced this issue Feb 6, 2018
Backports coq/coq@fc76dd1 to 8.7.1 to fix ocaml#11316.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

7 participants