Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-linscan option crashes ocamlopt #7631

Closed
vicuna opened this issue Sep 18, 2017 · 17 comments

Comments

Projects
None yet
2 participants
@vicuna
Copy link

commented Sep 18, 2017

Original bug ID: 7631
Reporter: @psteckler
Assigned to: @gasche
Status: resolved (set by @gasche on 2017-09-19T14:19:40Z)
Resolution: fixed
Priority: normal
Severity: crash
Fixed in version: 4.06.0 +dev/beta1/beta2/rc1
Category: compiler driver
Child of: #7630

Bug description

I have OPAM 4.06.0+fp+flambda installed (not available from the Mantis dropdown).

Running `make' with the attached code, I get a crash, with the stack trace:

$ make
ocamlopt -linscan -o runme b.ml a.ml
Fatal error: exception Invalid_argument("index out of bounds")
Raised by primitive operation at file "asmcomp/linscan.ml", line 111, characters 19-35
Called from file "list.ml", line 100, characters 12-15
Called from file "asmcomp/linscan.ml", line 115, characters 10-56
Called from file "asmcomp/linscan.ml", line 169, characters 4-28
Called from file "list.ml", line 100, characters 12-15
Called from file "asmcomp/asmgen.ml", line 87, characters 4-32
Called from file "utils/misc.ml", line 28, characters 20-27
Re-raised at file "utils/misc.ml", line 28, characters 50-57

Steps to reproduce

$ tar -xzf linscan.tgz
$ make

File attachments

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 19, 2017

Comment author: @xclerc

The problem seems to be related to frame pointers:

  • opam switch 4.06.0+trunk+fp+flambda -> failure;
  • opam switch 4.06.0+trunk+fp -> failure;
  • opam switch 4.06.0+trunk+flambda -> success.
@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 19, 2017

Comment author: @xclerc

Tentative fix: #1355

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 19, 2017

Comment author: @gasche

Xavier clerc's PR above is now merged in trunk and in the 4.06 branch. Paul, would you mind giving -linscan another try?

@vicuna vicuna closed this Sep 19, 2017

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 19, 2017

Comment author: @psteckler

I cloned trunk, ran configure with "-with-frame-pointer" and "-flambda" options, then "make world" and "make opt".

With the "-linscan" flag to ocamlopt, there is no crash, and the time is about 5.5 sec, much faster than the 11+ sec I saw with OPAM 4.06.0+fp+flambda when run without "-linscan". That's all good.

But -- for trunk without the "-linscan" flag, the time blows up to 42+ sec. That seems intolerable for compiling these two small files. Of that, over 36 sec is for register allocation, as given by "-dtimings". Maybe that's a new bug to file?

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 19, 2017

Comment author: @xclerc

Sorry, just to be sure, your results are:

  • 5.5 s for trunk/flambda/fp/linscan;
  • 11 s for 4.06/flambda/fp;
  • 42 s for trunk/flambda/fp.
    I am surprised by the gap between 4.06 and trunk when
    not using linscan.

By the way, if my code is correct, one of the interference
graphs for b.ml has more than 20K edges. So I am not sure
whether this is a bug or you just hit a "bad case" of the
algorithm.

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 19, 2017

Comment author: @ejgallego

Paul, what is the time difference without linscan between flambda and non-flambda [with standard regalloc]?

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 19, 2017

Comment author: @psteckler

Xavier, yes, those numbers are right, and that difference surprised me, too.

Emilio: For trunk+fp (no flambda), 2.4 sec without linscan, 1.2 sec with linscan.

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 19, 2017

Comment author: @ejgallego

Ok so flambda is adding 40 additional seconds to register allocation, even when -Oclassic is used.

That doesn't look right to me; there ought to be some other problematic codepath.

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 19, 2017

Comment author: @gasche

The branches "trunk" and "4.06" are virtually identical as of today, as the release branch (4.06) was branched yesterday. It is not possible to observe:

  • 11 s for 4.06/flambda/fp;
  • 42 s for trunk/flambda/fp

unless there is a measurement error.

(It may be interesting to test 4.05 as well, as that could help catching a regression in 4.06.)

Two other remarks:

  • I do find it plausible that flambda would add 40s to register allocation by merging declarations into locals too aggressively; again, in the past similar blowups of the graph coloring code have been observed on some unusual (human-written) code shapes
  • I'm not sure why you are systematically using the frame-pointers option for testing. It decreases performance (slightly), and (on a Linux system at least) it should not be necessary to get good debug information as we generate dwarf/cfi information that should allow to reconstruct the stack frames without a frame pointer.
@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 19, 2017

Comment author: @ejgallego

Hi Gabriel, indeed the 11 vs 42 numbers seem suspicious.

I do find it plausible that flambda would add 40s to register allocation by merging declarations into locals too aggressively.

Excuse my unfamiliarity with flambda, but should such merging happen even when using -Oclassic ? [I tried more esoteric options too, same results]

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 19, 2017

Comment author: @psteckler

Just to be sure I wasn't hallucinating, I ran these again.

For OPAM 4.06.0+fp+flambda:

$ opam switch 4.06.0+trunk+fp+flambda

To setup the new switch in the current shell, you need to run:

eval opam config env
steck@felafel ~/tmp/flambda $ eval opam config env
steck@felafel ~/tmp/flambda $ time make
ocamlopt -o runme b.ml a.ml

real 0m11.107s
user 0m10.656s
sys 0m0.088s

For trunk+flambda+fp (omitting the build, install steps):

--
$ time make
ocamlopt -o runme b.ml a.ml

real 0m42.593s
user 0m42.496s
sys 0m0.068s

--

When was the OPAM package for 4.06.0+fp+flambda created? It could be significantly older than what's in Github now.

I've been using frame pointers, because I'm told it gives more information for Linux "perf". Is that not true?

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 19, 2017

Comment author: @psteckler

I downloaded the 4.05.0 sources, configured with frame-pointers and flambda (for apples-to-apples comparison):

$ time make
ocamlopt -o runme b.ml a.ml

real 0m43.600s
user 0m43.200s
sys 0m0.372s

So -- about the same as with trunk+fp+flambda.

That suggests there's something odd about the OPAM 4.06.0+fp+flambda.

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 19, 2017

Comment author: @ejgallego

Paul, in this case you may want to pull directly from OCaml's github just to be sure. Compiling OCaml is fairly easy [at least in Linux]

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 20, 2017

Comment author: @gasche

I've been using frame pointers, because I'm told
it gives more information for Linux "perf". Is that not true?

I think that you can get perf to recover stack traces from dwarf information by using

perf record --call-graph dwarf

See

https://ocaml.org/learn/tutorials/performance_and_profiling.html#Using-perf-on-Linux

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 20, 2017

Comment author: @xclerc

Sorry to report that I cannot reproduce the problem: I get
the same timings with trunk+fp+flambda and 4.06.0+fp+flambda,
which is expected for the reasons pointed out by Gabriel.

@psteckler: the 4.06.0+fp+flambda opam compiler description
uses "https://github.com/ocaml/ocaml/archive/trunk.tar.gz"
as its source. My understanding is that it is hence the latest
version from the "trunk" branch when you install the switch.

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 20, 2017

Comment author: @psteckler

Emilio: the trunk version I built was pulled from Github. The 4.05+fp+flambda was from the official source distribution.

Xavier: my OPAM 4.06.0+fp+flambda was built on 18 September, around 1600 US EDT.

I just built 4.06+fp+flambda from the current Github, and the time is about 43 sec, as expected.

So again, there's something funny about the OPAM version, and I think we should just disregard it.

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 28, 2017

Comment author: @psteckler

@gasches I just tried using --call-graph dwarf with perf, with a non-fp OCaml 4.05.0.

Yes, it seems to work, but the profile file is more than 20x times larger, and takes a very long time to load with "perf report".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.