Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dune 1.11.0 fails to build on armv7 with odd relocation errors #2527

Open
rwmjones opened this issue Aug 8, 2019 · 22 comments

Comments

@rwmjones
Copy link

commented Aug 8, 2019

The log is here:

https://kojipkgs.fedoraproject.org//work/tasks/2862/36862862/build.log

The same architecture with 1.10:

https://kojipkgs.fedoraproject.org//packages/ocaml-dune/1.10.0/5.fc31/data/logs/armv7hl/build.log

It seems as if we need to pass -fPIC to the ocamlopt invocation (or at least that may not be the solution but I'd surely like to try that). However I can't work out how to do that with this very complex build/bootstrap system.

@diml

This comment has been minimized.

Copy link
Member

commented Aug 8, 2019

The error seems to trigger after the bootstrap. You can add a dune file at the root with that contents to add -fPIC everywhere:

(env (_ (ocamlopt_flags :standard -fPIC)))

Is it the same OCaml version in both case BTW?

@rwmjones

This comment has been minimized.

Copy link
Author

commented Aug 8, 2019

Yes. The builds were only a couple of days apart. I'll try the dune file idea, thanks.

@rwmjones

This comment has been minimized.

Copy link
Author

commented Aug 8, 2019

Unfortunately that didn't fix it. The option has been added, but the error is similar so I'm not sure what's going on. Latest log: https://kojipkgs.fedoraproject.org//work/tasks/3158/36863158/build.log

@diml

This comment has been minimized.

Copy link
Member

commented Aug 8, 2019

Is it possible to get the contents of the _boot/log file? Diffing the build commands between the successful and failing build might reveal something

@rwmjones

This comment has been minimized.

Copy link
Author

commented Aug 8, 2019

It's difficult to compare them because the output is kind of mixed up (plus I can't run this directly on the hardware but need to use the Fedora build system):
http://oirase.annexia.org/tmp/boot_log_good.txt
http://oirase.annexia.org/tmp/boot_log_bad.txt
Edit: I should note this is without using -fPIC

@rwmjones

This comment has been minimized.

Copy link
Author

commented Aug 8, 2019

I'm going to temporarily (cough) disable armv7 builds ...

@diml

This comment has been minimized.

Copy link
Member

commented Aug 8, 2019

Thanks. I diffed the files passed through grep '^\$' | sort. Looking at the failures from the original log and the diff, i noticed that for the command that are failing we are passing -nodynlink that we were not passing in 1.10. That's likely to be the issue.

I remember this change. The idea is that for executables, unless the shared_object mode is requested there is no point in generating dynlinkable code. So we pass -nodynlink in order to produce slightly better code. Not sure why that's breaking the build on armv7 though...

I wrote a quick patch to disable this optimisation in 1.11: diml@15c04b0

Would you be able to try it to validate this hypothesis?

@nojb

This comment has been minimized.

Copy link
Collaborator

commented Aug 9, 2019

Looking at the error

/usr/bin/ld: bin/main/.main_jbuilder.eobjs/native/build_info__Build_info_data.o: relocation R_ARM_THM_MOVW_ABS_NC against `caml_int_of_string' can not be used when making a shared object; recompile with -fPIC

makes me think that the problem is that the modules for executables are non-PIC mode but they are being linked with PIC libraries. In particular, the compiler uses movw/movt pair to load the address of global symbols when in -nodynlink mode:
https://github.com/ocaml/ocaml/blob/8c1107d910fb9e3ca07edfe922b56acaa69bdf74/asmcomp/arm/emit.mlp#L389-L391 which seems like it could be problematic for symbols coming from PIC libraries (but am not an expert, so take this with a grain of salt).

@rwmjones

This comment has been minimized.

Copy link
Author

commented Aug 9, 2019

I'm testing the patch now.

With respect to the previous comment, note that Fedora compiles virtually every C file with -fPIC. This includes libasmrun.a which contains objects compiled with:

gcc -c -O2 -fno-strict-aliasing -fwrapv -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -march=armv7-a -mfpu=vfpv3-d16 -mtune=generic-armv7-a -mabi=aapcs-linux -mfloat-abi=hard -Wall -Werror -g -D_FILE_OFFSET_BITS=64 -D_REENTRANT -DCAML_NAME_SPACE  -DOCAML_STDLIB_DIR='"/usr/lib/ocaml"'  -DNATIVE_CODE -DTARGET_arm -DMODEL_armv7 -DSYS_linux_eabihf   -o startup_aux_n.o startup_aux.c

where /usr/lib/rpm/redhat/redhat-hardened-cc1 is:

*cc1_options:
+ %{!r:%{!fpie:%{!fPIE:%{!fpic:%{!fPIC:%{!fno-pic:-fPIE}}}}}}

The reason for this is of course hardening and in this case the wish to make every binary in the distribution use PIE.

@rwmjones

This comment has been minimized.

Copy link
Author

commented Aug 9, 2019

No that doesn't seem to have worked. New build log: https://kojipkgs.fedoraproject.org//work/tasks/8832/36878832/build.log

@glondu

This comment has been minimized.

Copy link

commented Aug 9, 2019

I've just updated dune to 1.11.0 in Debian unstable, and it fails with a similar error:

https://buildd.debian.org/status/fetch.php?pkg=ocaml-dune&arch=armhf&ver=1.11.0-1&stamp=1565333162&raw=0

Note that it fails on armhf only, and the previous version (1.6.2) worked there. And we are still using OCaml 4.05.0 in unstable.

I've tried @diml's patch and it still fails.

@glondu

This comment has been minimized.

Copy link

commented Aug 9, 2019

About -nodynlink: when I compile a simple hello.ml file:

print_endline "Hello world!"

with ocamlopt -nodynlink hello.ml, I get similar errors that go away when I remove -nodynlink. So it might be that @diml's hypothesis is right, but the patch is wrong.

@nojb

This comment has been minimized.

Copy link
Collaborator

commented Aug 9, 2019

I think @diml's patch is wrong, it should set dynlink = true instead of false. Could you try it ?

@rwmjones

This comment has been minimized.

Copy link
Author

commented Aug 9, 2019

In case it isn't clear, on Fedora it also only fails on armv7, but succeeds on all other architectures. I will try to update diml's patch as suggested.

@glondu

This comment has been minimized.

Copy link

commented Aug 9, 2019

I think @diml's patch is wrong, it should set dynlink = true instead of false.

Of course...

Could you try it ?

It works! \o/

@rwmjones

This comment has been minimized.

Copy link
Author

commented Aug 9, 2019

Yes can also confirm the updated patch works.

ocamlopt -nodynlink hello.ml

So this sounds as if it could be an upstream bug on armv7? Given that it affects both Fedora and Debian so it's not likely to be because of hardening flags ...

@glondu

This comment has been minimized.

Copy link

commented Aug 9, 2019

Given that it affects both Fedora and Debian so it's not likely to be because of hardening flags ...

We also do hardening in Debian. I don't know the details, though.

@nojb

This comment has been minimized.

Copy link
Collaborator

commented Aug 9, 2019

Yes, looks like a compiler bug; see https://sourceware.org/ml/binutils/2009-04/msg00395.html where it is explained (I think) that one should not use MOVW/MOVT to load addresses of symbols from PIC libraries from non-PIC code.

@nojb

This comment has been minimized.

Copy link
Collaborator

commented Aug 9, 2019

About -nodynlink: when I compile a simple hello.ml file:

print_endline "Hello world!"

with ocamlopt -nodynlink hello.ml, I get similar errors that go away when I remove -nodynlink.

@glondu could you open an issue on https://github.com/ocaml/ocaml with this bug?

@diml

This comment has been minimized.

Copy link
Member

commented Aug 12, 2019

Oops, thatnks @nojb for spotting the mistake in my patch! Booleans are hard, especially after 🍷 haha.

What should we do about this then? Should we consider that it's a compiler bug and do nothing in Dune, or disable the optimisation at least in Dune 1.x?

@nojb

This comment has been minimized.

Copy link
Collaborator

commented Aug 17, 2019

I may be wrong, but it feels like https://discuss.ocaml.org/t/program-segfaults-when-compiled-with-dune-1-11/4254 is also caused by the -nodynlink change.

@diml

This comment has been minimized.

Copy link
Member

commented Aug 19, 2019

Alright, well let's disable this optimisation for now until we figure out the proper way to fix this. I pushed a commit to disable it in both 1.11 and master.

rgrinberg added a commit to rgrinberg/opam-repository that referenced this issue Aug 20, 2019
[new release] dune-build-info and dune (1.11.2)
CHANGES:

- Remove the optimisation of passing `-nodynlink` for executalbes when
  not necessary. It seems to be breaking things (see ocaml/dune#2527, @diml)

- Fix invalid library names in `dune-package` files. Only public names should
  exist in such files. (ocaml/dune#2558, fix ocaml/dune#2425, @rgrinberg)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.