Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`coq` build is broken in OSX. #11316

Closed
ejgallego opened this issue Jan 28, 2018 · 17 comments

Comments

Projects
None yet
7 participants
@ejgallego
Copy link
Contributor

commented Jan 28, 2018

It seems that bedf285 broke the build of Coq in OSX.

See for example: https://travis-ci.org/ocaml/opam-repository/jobs/334441912

This will likely require action in backporting upstream coq/coq#6518

An extremely annoying aspect of OPAM is these "updates" to already existing packages without increasing the version numbering. This indeed looks to me like a critical design flaw. Now, it turns out that some users will have num.1.1 in their system however there is no way to know if the have a 1.1 after or before bedf285 was committed.

So it seems to me that it is impossible to debug problems like these reliably.

@dbuenzli

This comment has been minimized.

Copy link
Collaborator

commented Jan 28, 2018

An extremely annoying aspect of OPAM is these "updates" to already existing packages without increasing the version numbering.

This is discussed here #10531

@ejgallego

This comment has been minimized.

Copy link
Contributor Author

commented Jan 28, 2018

@RalfJung

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2018

Now, it turns out that some users will have num.1.1 in their system however there is no way to know if the have a 1.1 after or before bedf285 was committed.

Is there any way the num maintainer could make a 1.1-1 release or so to indicate this, and revert 1.1 back to the old state? Even if this can (and probably should?) be fixed on the Coq side; I feel like a change breaking reverse dependencies like this shouldn't happen without some kind of version bump.

@ejgallego

This comment has been minimized.

Copy link
Contributor Author

commented Jan 29, 2018

@RalfJung the situation has been like this in OPAM for a long time, I guess we can open a separate bug requesting that. On the other hand, we should try to fix Coq's package, unfortunately I understand none of us both are OSX users so testing is gonna be hard. Maybe @herbelin could help here?

@herbelin

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2018

@ejgallego: If I summarize my understanding of the issue, there was a change in how num is installed with ocaml 4.06.0 which makes that 8.7.1 is not able any more to find nums.cmxa in the path, is that correct? What is specific to OS X then? E.g., why doesn't it happen under Linux as well?

Incidentally, would forcing the coq opam packages to use version 1.0 of num help?

@ejgallego

This comment has been minimized.

Copy link
Contributor Author

commented Jan 29, 2018

@ejgallego: If I summarize my understanding of the issue, there was a change in how num is installed with ocaml 4.06.0 which makes that 8.7.1 is not able any more to find nums.cmxa in the path, is that correct? What is specific to OS X then? E.g., why doesn't it happen under Linux as well?

Indeed it seems specific to OSX. I am not sure why it only happens there; indeed it would be good if we could figure out. The problem was caused by (an unversioned) change to the num package so it is indeed very hard to debug/track now.

Incidentally, would forcing the coq opam packages to use version 1.0 of num help?

IMHO this dependency would prove too restrictive so I advise against it. It seems the problem is indeed in Coq so the easiest fix for us would be to backport the ocamlfind patch scheduled for 8.7.2.

@herbelin

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2018

It seems the problem is indeed in Coq so the easiest fix for us would be to backport the ocamlfind patch scheduled for 8.7.2.

Looks reasonable.

IMHO this dependency would prove too restrictive so I advise against it.

That could however be a tradeoff for 8.7.1 to work on OS X?

@ejgallego

This comment has been minimized.

Copy link
Contributor Author

commented Jan 29, 2018

IMHO this dependency would prove too restrictive so I advise against it.

That could however be a tradeoff for 8.7.1 to work on OS X?

IMHO it would make sense for num to be reverted to a sane version and then the change implemented properly.

@dra27

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2018

Sorry that the change has caused a problem, but the issue was not with the num release, but with the way num was packaged in opam (which, while still something that could be worked on, is why the version was not bumped - we don't at present have package versions in opam-repository to release, say, num.1.1-2).

The issue was that opam-repository was allowing the num package to install things to /usr/local which is fundamentally incorrect (and causes other things to break further down the line). The fix in upstream is to provide an option to do a pure findlib install. This was backported to fix the installation of num to system switches.

This issue affects mainly OSX because /usr/local is often user-writeable, whereas on Linux it usually isn't (and the use of system switches is less common because they're typically too old).

The workaround for Coq should be to require a non-system switch (or to detect Num via ocamlfind, but perhaps that's less practical?).

@ejgallego

This comment has been minimized.

Copy link
Contributor Author

commented Jan 29, 2018

Thanks @dra27 for the explanation, I'm not sure this still explains the discrepancy, as far as I can tell num is supposed to be installed in a compatible way even under a prefix [that is to say, under lib/ocaml in addition of under lib/num. That's what does on Linux and it seems to avoid the problem.

@Zimmi48

This comment has been minimized.

Copy link

commented Jan 29, 2018

The workaround for Coq should be to require a non-system switch (or to detect Num via ocamlfind, but perhaps that's less practical?).

The suggested solution sound overly restrictive. And as @ejgallego said, the bug fix has already been merged and is scheduled for 8.7.2 (coq/coq@fc76dd1). Maybe indeed, this patch could be shipped with the OPAM package for Coq 8.7.1.

@ejgallego

This comment has been minimized.

Copy link
Contributor Author

commented Jan 29, 2018

Well, maybe Linux builds work by chance as they are picking out a different num package? I'm afraid we cannot know.

@ejgallego

This comment has been minimized.

Copy link
Contributor Author

commented Jan 29, 2018

The suggested solution sound overly restrictive. And as @ejgallego said, the bug fix has already been merged and is scheduled for 8.7.2 (coq/coq@fc76dd1). Maybe indeed, this patch could be shipped with the OPAM package for Coq 8.7.1.

This seems the best choice, the current opam package has indeed support for patches.

@dra27

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2018

The Linux builds are very unlikely to be using system switches - when installed to a non-system switch, the num package installs files to OCaml's lib directory, as always

@dra27

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2018

Just to expand on that, having looked at the log. There's no discrepancy between Linux/OSX here - all the other Travis Linux jobs build the compiler, and so the num.1.1 installs compatibly, as you say. On OSX, OCaml compiler is installed using Homebrew (since it has 4.06.0), and so it does not - because opam packages should never write outside $OPAMROOT.

#11245 adds testing for system switches on Linux as well - see job 17673.1 in https://travis-ci.org/ocaml/opam-repository/builds/330355502 and I can guarantee that this Coq package would have failed in just the same way in that job as it does on the OSX job you've linked to.

In this particular instance, package versioning wouldn't have helped, although it might have made what was changing clearer - if we assume that num.1.1-1 was the original and my revised version was num.1.1-2, then we would have deleted or disabled num.1.1-1 because it was doing something very broken (and which macOS users who need the num package are going to find tedious to fix next time they update ocamlfind...)

@avsm

This comment has been minimized.

Copy link
Member

commented Jan 29, 2018

The reason that OSX/Homebrew specifically breaks here is because there is a system and user-writeable /usr/local, which means that a packaging error in opam propagates and contaminates the system switch.

@ejgallego

This comment has been minimized.

Copy link
Contributor Author

commented Jan 29, 2018

Ah! Indeed that explains it, thanks a lot @dra27 !

I guess the versioning complain was more about not being able to rollback the package for our own testing than about building a correct package.

ejgallego added a commit to ejgallego/opam-repository that referenced this issue Jan 30, 2018

ejgallego added a commit to ejgallego/coq that referenced this issue Jan 30, 2018

[make] Fix CSDP build in split `num` configuration.
It seems that recent fixes using ocamlbuild to locate num missed to
switch the csdpcert binary. This is needed to solve problems like the
one in this issue: ocaml/opam-repository#11316

ejgallego added a commit to ejgallego/coq that referenced this issue Jan 30, 2018

[make] Fix CSDP build in split `num` configuration.
It seems that recent fixes using ocamlbuild to locate num missed to
switch the csdpcert binary. This is needed to solve problems like the
one in this issue: ocaml/opam-repository#11316

ejgallego added a commit to ejgallego/coq that referenced this issue Jan 30, 2018

[make] Fix CSDP build in split `num` configuration.
It seems that recent fixes using ocamlbuild to locate num missed to
switch the csdpcert binary. This is needed to solve problems like the
one in this issue: ocaml/opam-repository#11316

ejgallego added a commit to ejgallego/opam-repository that referenced this issue Jan 30, 2018

ejgallego added a commit to ejgallego/coq that referenced this issue Jan 30, 2018

[make] Fix CSDP build in split `num` configuration.
It seems that recent fixes using ocamlbuild to locate num missed to
switch the csdpcert binary. This is needed to solve problems like the
one in this issue: ocaml/opam-repository#11316

ejgallego added a commit to ejgallego/opam-repository that referenced this issue Jan 30, 2018

ejgallego added a commit to ejgallego/coq that referenced this issue Jan 30, 2018

[make] Fix build in split `num` configuration.
It seems that recent fixes using ocamlbuild to locate num missed to
switch in some binaries. This is needed to solve problems like the
one in this issue: ocaml/opam-repository#11316

ejgallego added a commit to ejgallego/coq that referenced this issue Jan 30, 2018

[make] Fix build in split `num` configuration.
It seems that recent fixes using ocamlbuild to locate num missed to
switch in some binaries. This is needed to solve problems like the
one in this issue: ocaml/opam-repository#11316

ejgallego added a commit to ejgallego/opam-repository that referenced this issue Jan 30, 2018

ejgallego added a commit to ejgallego/opam-repository that referenced this issue Feb 6, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.