Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some bugfixes to the PARI package #10430

Closed
jdemeyer opened this issue Dec 4, 2010 · 70 comments
Closed

Add some bugfixes to the PARI package #10430

jdemeyer opened this issue Dec 4, 2010 · 70 comments

Comments

@jdemeyer

This comment has been minimized.

@jdemeyer jdemeyer changed the title Add some bugfixes to PARI Add some bugfixes to the PARI package Dec 4, 2010
@nexttime
Copy link
Mannequin

nexttime mannequin commented Dec 5, 2010

comment:2

Perhaps we should really also address #10120, as more systems than originally reported seem to be affected, i.e. reduce (perhaps partially) optimization to -O1 to work around obvious bugs in GCC 4.4.1 on these platforms.

Did someone report this to the PARI guys? Perhaps they could provide a patch such that we don't have to maintain it (that selectively changes the compiler flags for only some files).

Unfortunately(?), not all people building on e.g. openSUSE 11.2 run into these problems, apparently.

@jdemeyer
Copy link
Author

jdemeyer commented Dec 5, 2010

comment:3

Replying to @nexttime:

Perhaps we should really also address #10120, as more systems than originally reported seem to be affected, i.e. reduce (perhaps partially) optimization to -O1 to work around obvious bugs in GCC 4.4.1 on these platforms.

Here's an idea: we first try to build with -O3 and when that doesn't work, fall back to -O2, then -O1, then -O0.

This way we don't have to find out exactly which versions of gcc are broken.

I think reporting this to PARI is pointless, because they can't help (and probably won't care about) a broken gcc.

@nexttime
Copy link
Mannequin

nexttime mannequin commented Dec 5, 2010

comment:4

Replying to @jdemeyer:

Replying to @nexttime:

Perhaps we should really also address #10120, as more systems than originally reported seem to be affected, i.e. reduce (perhaps partially) optimization to -O1 to work around obvious bugs in GCC 4.4.1 on these platforms.

Here's an idea: we first try to build with -O3 and when that doesn't work, fall back to -O2, then -O1, then -O0.

Koen reported:
"For reference: OpenSuse? 11.2 (gcc (SUSE Linux) 4.4.1 [gcc-4_4-branch revision 150839]) has the same problem when building PARI: on a machine with 64GB of RAM, it eventually fails after all memory is exhausted (takes hours). [...]"

So I don't think that's the way to go. (Other machines might start swapping, which effectively "freezes" some systems.)

Or should we do something like

    (ulimit -St 900; $MAKE) # Which value is appropriate?

?

I think reporting this to PARI is pointless, because they can't help (and probably won't care about) a broken gcc.

They at least perhaps have better experience which files are most likely to trigger failures due to GCC bugs.

@jdemeyer
Copy link
Author

jdemeyer commented Dec 5, 2010

comment:5

Replying to @nexttime:
Or should we do something like

    (ulimit -St 900; $MAKE) # Which value is appropriate?

?

How about ulimiting the memory?

@nexttime
Copy link
Mannequin

nexttime mannequin commented Dec 5, 2010

comment:6

Replying to @jdemeyer:

Replying to @nexttime:
Or should we do something like

    (ulimit -St 900; $MAKE) # Which value is appropriate?

?

How about ulimiting the memory?

Much harder to estimate, isn't it? (Feel free to test out adequate values, with -O3 etc.; perhaps something Dave likes...)

Ok, if a process starts thrashing, it won't consume much (user) CPU time as well.

@nexttime
Copy link
Mannequin

nexttime mannequin commented Dec 5, 2010

Changed keywords from pari spkg to pari spkg bugs patches

@sagetrac-drkirkby
Copy link
Mannequin

sagetrac-drkirkby mannequin commented Dec 5, 2010

comment:7

I don't think we should be changing ulimit. Sage used to unset it at one point, and that was changed in a trac ticket.

Changing it could cause all sorts of problems for someone. If Sage fails with the limit they set, then tough - they set the limit.

Once we start changing limits, we could cause other proceses to fail, which might be more important to someone.

Dave

@jdemeyer
Copy link
Author

jdemeyer commented Dec 5, 2010

comment:8

David: we could check the current value of ulimit -v to make sure we are only decreasing the value, not increasing.

I quickly tested ulimit -v on a few systems, this is what I found for the minimal power of 2 for ulimit -v to have a successful build of the pari spkg:

  1. Gentoo Linux, kernel 2.6.32, x86_64, gcc 4.6.0: 128 MB
  2. Ubuntu Linux 8.04.4 LTS, kernel 2.6.24, x86_64, gcc 4.5.1: 128 MB
  3. Mac OS X 10.4 PPC, gcc 4.0.1: ulimit -v doesn't seem to work

@nexttime
Copy link
Mannequin

nexttime mannequin commented Dec 5, 2010

comment:9

Replying to @sagetrac-drkirkby:

I don't think we should be changing ulimit. Sage used to unset it at one point, and that was changed in a trac ticket.

Changing it could cause all sorts of problems for someone. If Sage fails with the limit they set, then tough - they set the limit.

Once we start changing limits, we could cause other proceses to fail, which might be more important to someone.

We would only set limits in (PARI's) spkg-install.

Note that ulimit only affects the current process and its subprocesses (i.e. gets inherited), therefore I also used the parentheses in the example above.

ulimit is (also) a bash built-in btw. We could also limit its use to Linux.

And ordinary users (i.e., their processes) cannot increase limits once they are set.

@nexttime
Copy link
Mannequin

nexttime mannequin commented Dec 5, 2010

comment:10

P.S.:

If we do "trial building" with some limit(s), we should also make sure that the build actually failed due to a resource limit before retrying with less optimization, e.g. check that the exit code was 152 (SIGXCPU + 128) if we use a CPU time limit.

@nexttime
Copy link
Mannequin

nexttime mannequin commented Dec 5, 2010

comment:11

Replying to @nexttime:

P.S.:

If we do "trial building" with some limit(s), we should also make sure that the build actually failed due to a resource limit before retrying with less optimization, e.g. check that the exit code was 152 (SIGXCPU + 128) if we use a CPU time limit.

With ulimit -v I receive SIGKILL on exhausted memory, which isn't very specific...

@jdemeyer
Copy link
Author

jdemeyer commented Dec 5, 2010

comment:12

Replying to @nexttime:

P.S.:

If we do "trial building" with some limit(s), we should also make sure that the build actually failed due to a resource limit before retrying with less optimization

The build could fail for many various reasons, including but not limited to allocating too much memory. There are various other tickets where a PARI build fails because of a broken gcc. All these should be caught, not only the cases where we run out of memory.

@nexttime
Copy link
Mannequin

nexttime mannequin commented Dec 5, 2010

comment:13

Replying to @jdemeyer:

Replying to @nexttime:

P.S.:

If we do "trial building" with some limit(s), we should also make sure that the build actually failed due to a resource limit before retrying with less optimization

The build could fail for many various reasons, including but not limited to allocating too much memory. There are various other tickets where a PARI build fails because of a broken gcc. All these should be caught, not only the cases where we run out of memory.

Of course.

I wonder if we then would get PARI build errors due to GCC bugs reported any longer... ;-)

@nexttime
Copy link
Mannequin

nexttime mannequin commented Dec 6, 2010

comment:14

Got one more PARI flaw:

It installs three real copies of the shared library rather than one with two symbolic links to it.

Currently not sure if (but I believe) that's an upstream matter, or if we do that.

@jdemeyer
Copy link
Author

jdemeyer commented Dec 9, 2010

Author: Jeroen Demeyer

@jdemeyer

This comment has been minimized.

@jdemeyer

This comment has been minimized.

@jdemeyer
Copy link
Author

comment:16

Very preliminary spkg: http://sage.math.washington.edu/home/jdemeyer/spkg/pari-2.4.3.alpha.p1.spkg (not yet tested properly)

@jdemeyer
Copy link
Author

Doctest fixes

@jdemeyer
Copy link
Author

Attachment: 10430_branch_cut.patch.gz

spkg patch for reference

@jdemeyer
Copy link
Author

comment:17

Attachment: pari-2.4.3.alpha.p1.diff.gz

@jdemeyer

This comment has been minimized.

@jdemeyer
Copy link
Author

comment:49

Replying to @sagetrac-drkirkby:

One would need to ascertain if this is a gcc bug or a Pari bug. Badly written code will cause more aggressive optimisations to fail. It does not necessary mean it is a compiler bug.

Badly written code should never cause the compiler to crash or to use infinite memory. These things are certainly compiler bugs.

@jdemeyer

This comment has been minimized.

@jdemeyer
Copy link
Author

comment:52

I removed all the trial building code from the spkg in light of #10572. Also added patches for #10559. New spkg needs review.

@vbraun
Copy link
Member

vbraun commented Jan 12, 2011

comment:53

I'm happy with the current version, so I'll give this ticket a positive review. If any compiler bugs are still preventing pari from being built on some hardware then this should be reported to the gcc wrapper.

@vbraun
Copy link
Member

vbraun commented Jan 12, 2011

Changed reviewer from Leif Leonhardy to Leif Leonhardy, Volker Braun

@jdemeyer
Copy link
Author

Merged: sage-4.6.2.alpha1

@jdemeyer
Copy link
Author

comment:55

Testing on the buildbot seems to indicate there might still be some race conditions in parallel make install. So maybe we should avoid doing that.

@jdemeyer
Copy link
Author

Changed merged from sage-4.6.2.alpha1 to none

@jdemeyer jdemeyer reopened this Jan 20, 2011
@jdemeyer
Copy link
Author

comment:56

Fixed race conditions in make install by using -j1.

@jdemeyer

This comment has been minimized.

@vbraun
Copy link
Member

vbraun commented Jan 23, 2011

comment:57

That'll get rid of potential races in installation. Perhaps we should disable parallel make for all spkgs that don't use a proven build system like autotools or SCons. Chances are that any hand-rolled makefile has concurrency issues...

I'll take it that you are going to commit the changes to the included repository before adding the spkg to the next Sage release, because right now they are not.

@jdemeyer
Copy link
Author

comment:58

Replying to @vbraun:

I'll take it that you are going to commit the changes to the included repository before adding the spkg to the next Sage release, because right now they are not.

Yes, done.

@jdemeyer
Copy link
Author

Merged: sage-4.6.2.alpha2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants