Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Re-)enable building on illumos (SmartOS, OmniOS, ...) and Solaris #10063

Merged
merged 23 commits into from Jan 14, 2021

Conversation

tleedjarv
Copy link
Contributor

This PR replaces the now unmaintained #9087.

There are still several active OS distributions based on OpenSolaris and illumos. SmartOS, OmniOS and OpenIndiana are the ones best known. OCaml works well on these platforms. See also #2024 (comment) and #2024 (comment)
OCaml is also included in pkgsrc which is available on these platforms.

In this PR, build configuration is provided only for 64-bit and GCC.

Explanation to changes in systhreads sources

The issue with _POSIX_PTHREAD_SEMANTICS is that even though it is defined in otherlibs/systhreads/st_posix.h, it is too late because otherlibs/systhreads/st_stubs.c includes caml/signals.h before st_posix.h. Due to this, there is no point in defining _POSIX_PTHREAD_SEMANTICS in st_posix.h.

There are two solutions to this. First, rely on autoconf setting PTHREAD_CFLAGS and include it in the Makefile. Alternatively, add the definition in st_stubs.c.

Defining _POSIX_PTHREAD_SEMANTICS in st_stubs.c makes compiling it independent of autoconf and Makefile, if that is what is desired. On the other hand, the definition may be seen as the responsibility of the build system. In this PR, both of these solutions are present for review (and may remain as is, since they don't conflict with each other).

@tleedjarv
Copy link
Contributor Author

I have added a commit to enable stack overflow detection and naked pointers checker on these platforms. These are not needed to enable building, so can be removed from PR if issues are found during review.

@tleedjarv
Copy link
Contributor Author

@jperkin I would welcome a review.

@ksromanov
Copy link
Contributor

Unfortunately it does not work on my system:

In file included from /opt/csw/lib/gcc/sparc-sun-solaris2.10/5.5.0/include-fixed/iso/math_iso.h:19:0,
                 from /opt/csw/lib/gcc/sparc-sun-solaris2.10/5.5.0/include-fixed/math.h:22,
                 from floats.c:26:
/opt/csw/lib/gcc/sparc-sun-solaris2.10/5.5.0/include-fixed/sys/feature_tests.h:346:2: error: #error "Compiler or options invalid for pre-UNIX 03 X/Open applications       and pre-2001 POSIX applications"
 #error "Compiler or options invalid for pre-UNIX 03 X/Open applications \

SunOS solaris 5.11 OpenSXCE2014.05__Illumos20140505 sun4u sparc SUNW,Sun-Blade-2500 Solaris

@tleedjarv
Copy link
Contributor Author

@ksromanov I forgot to mention this patch is only for x86_64 builds.
It should still be possible to get sparc builds working, but native backend will not be available (#659).

The error you are getting could be related to the following define in floats.c (see a similar report here: #7256)

#define _XOPEN_SOURCE 700

I don't know what the fix would be here. It does not look like a sparc-related issue, more like having a rather old toolchain.

@ksromanov
Copy link
Contributor

@tleedjarv , could you please rebase your branch? I have a patch for Solaris/SPARC, which is not published. Unfortunately it is quite old and has merge conflict with current trunk. May be we can make a joint patch, which works with Solaris/SPARC and Illumos/(SPARC/x86).

@tleedjarv
Copy link
Contributor Author

Rebased.

May be we can make a joint patch, which works with Solaris/SPARC and Illumos/(SPARC/x86).

@ksromanov Happy to do a joint patch!

@ksromanov
Copy link
Contributor

ksromanov commented Dec 29, 2020

@tleedjarv , Tõivo, would you please check that my changes (see PR to your fork) do not break anything on your system.

I checked the patch - it works on both
SunOS solaris 5.11 OpenSXCE2014.05__Illumos20140505 sun4u sparc SUNW,Sun-Blade-2500 Solaris with Sun C 5.13 and

SunOS sundev1 5.11 11.3 sun4v sparc sun4v Solaris with Sun C 5.13 SunOS_sparc Patch 151632-10 2020/04/08

for 64-bit target. Due to alignment issues with multicore-related domain-state.tbl it does not compile for 32-bit target - may be I can find Sun C alignment switch, which fixes it.

It also seem not to break regular linux/gcc target (I had to remove one include in ocamlyacc, which seems redundant. The include lead to unresolved symbols on SPARC).

@tleedjarv
Copy link
Contributor Author

Very good, Konstantin.

It also works for me on SunOS bld 5.11 omnios-r151034-831ff8e83b i86pc i386 i86pc with gcc (64-bit target). I don't think I have a Sun compiler anywhere to test. Changes in your commits look good to me.

Shall I merge your commits to this PR?

@ksromanov
Copy link
Contributor

Shall I merge your commits to this PR?

Yes, please.

@tleedjarv
Copy link
Contributor Author

Shall I merge your commits to this PR?

Yes, please.

Done. Konstantin, could you propose an update to the Changes file, to include what you have done.

@ksromanov
Copy link
Contributor

ksromanov commented Dec 29, 2020

Could you please replace

- #10063: (Re-)enable building on illumos (SmartOS, OmniOS, ...) and other
   OpenSolaris-based platforms (x86_64 and GCC only) (partially revert #2024).
   (Tõivo Leedjärv, review by ??)

with

- #10063: (Re-)enable building on illumos (SmartOS, OmniOS, ...) and
   Oracle Solaris; x86_64/GCC and 64-bit SPARC/Sun PRO C compilers. (partially revert #2024).
   (Tõivo Leedjärv, Konstantin Romanov review by ??)

And, please, regenerate configure - I didn't commit the last version of it!

@tleedjarv tleedjarv changed the title (Re-)enable building on illumos (SmartOS, OmniOS, ...) (Re-)enable building on illumos (SmartOS, OmniOS, ...) and Solaris Dec 29, 2020
@ksromanov
Copy link
Contributor

@tleedjarv

Tõivo,

  1. I checked the final branch - commit 8066568, everything compiles well with

./configure CC="cc -m64"; make world

on my systems (T5 and SunBlade 2500 Silver, see above). 32-bit builds do not work, but it is
acceptable for my use cases. Anyway, if I find a way to enable them, I'll post a fix here.

  1. I sent an e-mail to Sébastien Hinderer with the humble request to take a look at this PR. Hopefully he will have some time after New Year. We worked with him on AIX/Xlc patches before.

So, right now our work is done. Happy New Year!

@tleedjarv
Copy link
Contributor Author

tleedjarv commented Dec 30, 2020

@ksromanov It seems that the removal of one include in ocamlyacc breaks the build on Windows (see AppVeyor logs).

I restored the include for Windows. Konstantin, can you check if the build is still working for you.

@ksromanov
Copy link
Contributor

ksromanov commented Dec 30, 2020

@tleedjarv

I restored the include for Windows. Konstantin, can you check if the build is still working for you.

Unfortunately, Tõivo, it got broken on T5/Oracle Solaris (on SunBlade/Illumos everything works great). I'll find a solution, though I don't know how much time it takes. Anyway, this single include can be removed manually on our systems. So, let's leave it as is for now, while I am searching for the solution.

There are couple small PRs by David A. which alter configure.ac waiting in the queue. It is highly likely that they get merged and we get a merge conflict. Tõivo, would you please rebase our PR from time to time? I can help if needed.

@tleedjarv
Copy link
Contributor Author

Anyway, this single include can be removed manually on our systems. So, let's leave it as is for now, while I am searching for the solution.

The include is now protected by #ifdef _WIN32 so if this breaks on Oracle Solaris for some reason (having _WIN32 defined there indicates a bug somewhere else) maybe you could just negate this and replace it with an #ifndef <__sun or __SUNPRO_C or whatever makes sense here>?

There are couple small PRs by David A. which alter configure.ac waiting in the queue. It is highly likely that they get merged and we get a merge conflict. Tõivo, would you please rebase our PR from time to time? I can help if needed.

No problem at all. I will continue maintaining and rebasing this PR for the foreseeable future.

@ksromanov
Copy link
Contributor

ksromanov commented Dec 31, 2020

(having _WIN32 defined there indicates a bug somewhere else)

This hack with #ifdef _WIN32 works for my systems. As far as I understand, the root cause is that Sun PRO C does not respect inline keyword (at least with default command line arguments). I will try to figure out the better fix.

@ksromanov
Copy link
Contributor

ksromanov commented Jan 2, 2021

@tleedjarv

Tõivo, I have some update:

  1. I found the way to overcome the Sun PRO compiler problem with ocamlyacc without source code changes: the optimization level should be higher or equal to -O4.

Could you please

a. Accept my PR - tleedjarv#2 (and please test it).
b. Remove two commits:

"Remove caml/osdeps.h include from yacc." by me and
"Keep caml/osdeps.h include in yacc for Windows" by you

c. regenerate configure or use mine.
d. test on your system.

  1. Unfortunately the 32-bit target for SPARC does not work still and I don't think that I'll be able to fix it in the near future.

  2. I have added notes to INSTALL.adoc, since compilation on SPARC/Solaris is not trivial. Please add a working example if it is not a simple ./configure; make world on x86/Illumos.

@tleedjarv tleedjarv force-pushed the solaris-build branch 3 times, most recently from d66e125 to 7490b89 Compare January 2, 2021 21:30
@tleedjarv
Copy link
Contributor Author

@ksromanov Done and done. Works good on my system.

I had to regenerate configure because your version had some unrelated differences in there (different autoconf version?).

I don't have any special compilation instructions to add. I use ./configure; make myself and it works.

(On another note, rebasing and changing commits added my name to your commits. I hope it's ok, but I guess it's possible clean it up as well.)

@ksromanov
Copy link
Contributor

@tleedjarv,

Thanks a lot! I checked the result on our systems (T5 and SunBlade - everything works for 64-bit).

Unfortunately, I found one redundant commit - "Regenerate configure" ef2f0d1

Would you please remove it to clean a bit our commit history?

I hope it's ok, but I guess it's possible clean it up as well.

Yes, of course! (actually, I don't care about the authorship of these commits at all)

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ksromanov asked me if I could have a look at this patch. It looks okay to me, with most of the changes reasonable-looking and in configure.ac. The changes to the actual runtime code are minimal, and mostly look fine. The special-casing of __SUNPRO_C to silence _XOPEN_SOURCE definition is a bit unpleasant, but it's a sensible choice if we do want to make the users of those systems happy.

I am not knowledgeable enough in this part of the system to give an approval stamp myself, but I left review notes in the parts of the patch that I think are worth looking at for an expert. (Maybe count this as half-an-approval; easy to review, only one half remaining!)

@@ -1491,10 +1514,11 @@ AC_CHECK_HEADER([sys/mman.h],
AC_CHECK_FUNC([pwrite], [AC_DEFINE([HAS_PWRITE])])

## -fdebug-prefix-map support by the C compiler
AS_CASE([$CC,$host],
AS_CASE([$ocaml_cv_cc_vendor,$host],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review note: this change affects all architectures, but actually it only matters when the cc_vendor is xlc and sunc, so it is actually rather safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xlc and sunc are the oldest compilers of all 5 supported.

otherlibs/systhreads/Makefile Show resolved Hide resolved
runtime/floats.c Outdated
#define _XOPEN_SOURCE 700
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is strange that this (and the other disabling of XOPEN_SOURCE like that) would be necessary. I looked around the web and it looks rather similar to this issue. I think it would be nice to have a comment in the source as to why __SUNPRO_C is handled specially here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this is a problem of old standard library of Sun PRO C. I do not know a better way to fix it. We can add a comment about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Xavier Leroy I found out that it is not necessary and can be substituted with -D_XPG6 compiler option. Will remove.

@shindere
Copy link
Contributor

shindere commented Jan 5, 2021 via email

@ksromanov
Copy link
Contributor

ksromanov commented Jan 5, 2021

@shindere thank you very much for reviewing this PR!

And regarding the -O4 flag, am I understanding correctly that it's the only way to get funcitons inlined?

Yes, exactly. Unfortunately, otherwise Sun PRO C makes them external, these functions reference other functions, which are not linked into ocamlyacc and Solaris linker ends up with unresolved externals. This is a problem on Solaris/Sun PRO C side and I don't want to modify/spoil ocamlyacc code to workaround this problem.

From my point of view, in about 10 years we will forget about SPARC target, so it is better to leave this in configure.ac. Hopefully most of the code changes are needed for Illumos/x86-64, which will live much longer than Solaris/SPARC target.

Should we update INSTALL.adoc mentioning that the optimization level on Solaris/sunc/SPARC should be no less than -O4?

Regarding the commit history: would it please be possible to make sure that configure is re-generated each time configure.ac is modified? It's better to keep both in sync for each commit because it makes bisection easier.

  1. May be we just squash these commits?

  2. Otherwise @tleedjarv or myself can just rewrite the commits, running autogen in between. Unfortunately we have different versions of autoconf, so it should be a single person.

Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to better understand several of the proposed changes, see my questions below.

Also: did you test with GCC or Clang as the C compiler? Performance of bytecode programs will suck with the Sun Pro compiler, assuming it does not implement the GNU extensions to C. That's why we strongly recommend GCC or Clang.

otherlibs/unix/execvp.c Outdated Show resolved Hide resolved
Comment on lines 18 to 20
#if !defined(__SUNPRO_C)
#define _XOPEN_SOURCE 700
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to understand the problem better. _XOPEN_SOURCE 700 corresponds to the Single Unix Specification version 4. https://docs.oracle.com/cd/E88353_01/html/E37853/xpg6-7.html says that SUSv4 is supported provided the c99 option is given to the compiler, and then _XOPEN_SOURCE is set to 700 (so, re-defining it should be a no-op).

  1. Why are things not working for you as described in Oracle's docs?
  2. If you compile with gcc but with the same set of standard include files, __SUNPRO_C will be false and _XOPEN_SOURCE will be defined to 700. Does this cause the same errors?

Copy link
Contributor

@ksromanov ksromanov Jan 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On

SunOS sundev13 5.11 11.3 sun4v sparc sun4v Solaris with Sun PRO C I get an error

"/usr/include/sys/feature_tests.h", line 354: #error: "Compiler or options invalid for pre-UNIX 03 X/Open applications 	and pre-2001 POSIX applications"

another way to fix it is to define -D_XPG6="" for Solaris/Sun PRO C, which can be done in configure! Thanks a lot, @xavierleroy , we need to remove this change to link.c! See tleedjarv#3

So,

  1. Perhaps it is because _XPG6 was not defined. Oracle docs say (see your link)

"A default Oracle Solaris installation might require additional steps in order to be fully SUSv4 compliant."

apparently we do not have these steps fulfilled. The best workaround seems to define _XPG6.

  1. Unfortunately we do not have gcc on SPARC systems in BB. I tried clang, and yes, it does not like this place (float.c and link.c have the same _XOPEN_SOURCE define):
clang -c -O2 -fno-strict-aliasing -fwrapv -Wall -Wdeclaration-after-statement -Werror -fno-common -g  -D_FILE_OFFSET_BITS=64 -D_REENTRANT -DCAML_NAME_SPACE   -DCAMLDLLIMPORT=  -std=c99 -o floats.b.o floats.c
floats.c:21:9: error: '_XOPEN_SOURCE' macro redefined [-Werror,-Wmacro-redefined]
#define _XOPEN_SOURCE 700
        ^
<built-in>:311:9: note: previous definition is here
#define _XOPEN_SOURCE 600
        ^
In file included from floats.c:26:
In file included from /usr/include/math.h:17:
In file included from /usr/include/iso/math_iso.h:12:
/usr/include/sys/feature_tests.h:354:2: error: "Compiler or options invalid for pre-UNIX 03 X/Open applications 	and pre-2001 POSIX applications"
#error "Compiler or options invalid for pre-UNIX 03 X/Open applications \
 ^
2 errors generated.

defining _XPG6 solves only the second error:

clang -c -O2 -fno-strict-aliasing -fwrapv -Wall -Wdeclaration-after-statement -Werror -fno-common -g  -D_FILE_OFFSET_BITS=64 -D_REENTRANT -DCAML_NAME_SPACE   -DCAMLDLLIMPORT=  -D_XPG6="" -std=c99 -o floats.b.o floats.c
floats.c:21:9: error: '_XOPEN_SOURCE' macro redefined [-Werror,-Wmacro-redefined]
#define _XOPEN_SOURCE 700
        ^
<built-in>:311:9: note: previous definition is here
#define _XOPEN_SOURCE 600
        ^
1 error generated.

@shindere
Copy link
Contributor

shindere commented Jan 5, 2021 via email

@xavierleroy
Copy link
Contributor

May be we just squash these commits?

I would prefer to squash everything in the end, but this can be done by a "squash and merge" on Github, no need to change your source repository.

Before this commit, the filter-locations.sh script used in two
backtrace tests was using the GNU -o command-line option to grep.

This commit rewrites the script so that it does not rely on this option
any longer.

This requires to rewrite the referencefiles for the two tests in question.
@tleedjarv
Copy link
Contributor Author

I can confirm I also see the core dump. I did a little bit of debugging and from what I can tell, segv_handler is called, which in turn calls caml_raise_stack_overflow, which in turn calls caml_raise. What exactly fails there, I can't tell.
Also, it is only the second test that fails. The first successfully produces output Stack overflow caught.

@shindere
Copy link
Contributor

shindere commented Jan 13, 2021 via email

@xavierleroy
Copy link
Contributor

I could not reproduce the crash with the tests/runtime-errors/stackoverflow.ml test, using the same OmniOS VM that @shindere uses.

@shindere
Copy link
Contributor

shindere commented Jan 14, 2021 via email

@xavierleroy
Copy link
Contributor

All right, I see the segfault now.

I had to install gdb (because mdb is no fun), from sources (because I couldn't find a package for it, that's sad).

Turns out that the si_addr of the siginfo_t passed to the SEGV signal handler does not contain the faulting address: in this run, si_addr is 0xfffffc7f00005cec while the fault occurs when writing at 0xfffffc7fff400000 - 8. The SEGV handler rejects the faulting address as bogus (it's not even word-aligned).

I've exhausted the time I can spend on this PR, so I will not debug any further.

Please turn stack overflow detection off in the configure script.

Stack overflow detection is not working properly on Solaris.
@tleedjarv
Copy link
Contributor Author

Thank you Xavier, I really appreciate your taking the time.
Stack overflow detection now turned off.

@xavierleroy
Copy link
Contributor

Thank you!

After 108 messages and 23 commits, it's time to merge this PR. I'll preserve the commit history because it's pretty clean. Thanks to all who participated.

@xavierleroy xavierleroy merged commit 42b0efb into ocaml:trunk Jan 14, 2021
@ksromanov
Copy link
Contributor

ksromanov commented Jan 14, 2021

Thank you very much!

@Octachron are you planning to cherry-pick the final set of commits into 4.12 branch?

@tleedjarv tleedjarv deleted the solaris-build branch January 14, 2021 19:05
@xavierleroy
Copy link
Contributor

My opinion: 4.12.0 is in beta already, and I feel this set of changes is too big to be added at the last moment.

@Octachron
Copy link
Member

At the same time, for non-illumos or openSolaris operating systems, this PR does not change anything, isn't it?

It seems to me that having an experimental support for illumos and openSolaris available in the wild would garner more scrutiny than letting the patch linger on trunk for one release cycle.

@shindere
Copy link
Contributor

shindere commented Jan 15, 2021 via email

@gasche
Copy link
Member

gasche commented Jan 15, 2021

(copied over from #10155, because the issue is better discussed here)

I tried to understand what could be the cause of the Windows issue, here is a summary. Some of the tests in testsuite/tests/backtrace check that we get backtraces as intended. When the backtrace-printing format changed recently ('Called from "foo.ml"' => 'Called from function_bar in file "foo.ml"'), someone wrote a grep script to post-process both backtrace formats into a common reference format, to have a single reference file work before and after the backtrace change. The script uses grep -o which apparently is not POSIX, so in the present PR it was changed to use a grep+sed combination (this also makes the output more readable). I suspect that the use of sed s/foo.*// may be causing Windows incompatibilities, for example by capturing and removing the \r at the end of the line.

I think that a sensible fix would be to remove the script and have a reference file that is specific to the new backtrace format, and does not work with the older backtrace format. This makes exchanging testsuite changes between OCaml versions more cumbersome (in particular when rebasing fixes to older branches), but to me this is less painful than thinking about making sed work correctly on both Windows and Solaris. @dra27 may have a different perspective on things.

@shindere
Copy link
Contributor

shindere commented Jan 15, 2021 via email

@gasche
Copy link
Member

gasche commented Jan 15, 2021

After more investigation: my guess that the script was introduced to bridge over backtrace-printing differences was wrong. It was actually introduced long before in 28dc832 , when support for inlining in backtraces was introduced. (I suppose that the point was to remain robust in the reference file to different choices of inlining, typically between clambda and flambda.)

@shindere
Copy link
Contributor

shindere commented Jan 15, 2021 via email

@gasche
Copy link
Member

gasche commented Jan 15, 2021

I proposed a PR (#10157) to remove the sed s/foo.*// in the hope that it would fix the issue. I have no idea, though, let's wait to see what the CI thinks.

Note: getting input from test authors is nice, but right now our CI is broken, so I think we should try to fix it or revert the change.

@ksromanov ksromanov mentioned this pull request Jan 15, 2021
Octachron pushed a commit that referenced this pull request Jan 25, 2021
Re-enable building on x86-64 illumos (SmartOS, OmniOS, ...) and Solaris.

(cherry picked from commit 42b0efb)
@Octachron
Copy link
Member

@ksromanov : we are currently testing this patch in the second beta for 4.12.0 . If everything goes well it should be part of the 4.12.0 release.

@ksromanov
Copy link
Contributor

@Octachron Thanks a lot!

@shindere
Copy link
Contributor

shindere commented Jan 27, 2021 via email

shindere pushed a commit that referenced this pull request Jan 28, 2021
strchr() is standard and declared in <string.h>.
index() is legacy and declared in <strings.h>, which we don't include.
The discrepancy was found on Solaris as part of #10063.

Cherry-picked fro 9eb015e in 4.12 by
Sébastien Hinderer.
dbuenzli pushed a commit to dbuenzli/ocaml that referenced this pull request Mar 25, 2021
strchr() is standard and declared in <string.h>.
index() is legacy and declared in <strings.h>, which we don't include.
The discrepancy was found on Solaris as part of ocaml#10063.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants