Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default Leiningen :jvm-opts can cause ClojureScript compiler stack overflows #303

Closed
cemerick opened this issue Apr 1, 2014 · 15 comments
Closed

Comments

@cemerick
Copy link
Collaborator

cemerick commented Apr 1, 2014

I had a ClojureScript codebase that compiled fine under 0.0-2173, but failed with a stack overflow in the compiler under 0.0-2197. Motivated by wanting to not suss out the source of this "bug", I discovered a workaround (and, AFAICT, the root cause):

  • add :jvm-opts [] to project.clj

Leiningen by default adds two :jvm-opts to subprocesses it spawns, including those that perform ClojureScript compilation on behalf of lein-cljsbuild:

-XX:+TieredCompilation
-XX:TieredStopAtLevel=1

(These aid in improving JVM startup time (documented here for the Leiningen audience and here in brief for those interested in more JVM-level details), by running two compilers in the early stages of the process: the faster-to-start "client" compiler is given precedence, and only until the "server" compiler has captured enough instrumentation about "hot" methods are its results used.)

If these default options are removed (simply by specifying your own :jvm-opts), then my problematic CLJS compile completed without a problem.

Notably, adding a JVM option to increase thread stack size (doubling it here in my case) to the default Leiningen JVM options also yields a successful CLJS compilation for the project in question:

:jvm-opts ["-Xss2m" "-XX:+TieredCompilation" "-XX:TieredStopAtLevel=1"]

This almost makes sense: when you have two JITs running and/or gathering instrumentation data, that data may be stack-allocated, and thus contribute to thread stack size overhead; solutions include increasing the thread stack size, or turning off one of the compilers (i.e. not using tiered compilation).

Given this experience, I'm leaning towards overriding Leiningen's tiered compilation defaults in every lein-cljsbuild subprocess. Faster startup time is good, but troubleshooting this problem (esp. if reported by others) more than once has little appeal. An alternative would be to find where the ClojureScript compiler is consuming stack, and address that (probably by realizing lazy sequences earlier); while probably a good exercise in general, that's a much "softer" solution that may not yield an immediate resolution.

I'm open to any other suggestions / discussion around this issue. Also, if anyone can produce a minimal reproducible case, that would be very useful on a number of fronts (the project of mine that sparked this is ~2K LOC; hardly large, but not exactly something I'd like to spend a lot of time whittling down).

@cemerick cemerick added this to the 1.0.4 milestone Apr 1, 2014
@swannodette
Copy link

Are you using core.match?

@cemerick
Copy link
Collaborator Author

cemerick commented Apr 1, 2014

@swannodette I am. I have not changed my core.match dependency in some time though (0.2.0).

@swannodette
Copy link

@cemerick it's worth investigating whether the source produced by core.match might not be the culprit.

@cemerick
Copy link
Collaborator Author

cemerick commented Apr 1, 2014

@swannodette core.match is actually only used in a set of macros Clojure-side in this particular project, so I think I can safely exclude it; no match body should ever hit the CLJS compiler AFAICT. I'll check in more detail at some point, though.

@cemerick
Copy link
Collaborator Author

cemerick commented Apr 1, 2014

@swannodette Looks like core.async output is "causing" the stackoverflow; heavily-nested sets of conditionals, tons of state_val_XXXX identity checks. Running the state machines I presume.

Knowing this helps me narrow down the proximate source of the problem quite a bit. Hopefully I can produce a small example that will reproduce reliably.

@swannodette
Copy link

@cemerick yeah I was going to suggest looking at core.async next.

@cemerick
Copy link
Collaborator Author

cemerick commented Apr 1, 2014

Some further digging and experimentation leads me to think it's an interaction between usage of the is macro in clojurescript.test within go blocks. Each of the former emits a try/catch form, which, given what I know about the CSP transform, will yield significant reams of code when go adds what it needs to implement/cope with the flow control involved. (A couple of macroexpansions confirm this.) Removing just a couple of the is forms within one of the offending go blocks eliminates the stackoverflow.

(Unless I'm misreading the situation, it really is analogous to JVM method size limitations, but bounded in this case by the size of each set of stack frames corresponding to the traversal of each level in the analysis tree. I flailed around for 2 minutes adding some doalls to places in the analyzer where it looked like a map might produce a lazy, deeply nested structure, but didn't hit on anything effectual.)

I've blown my budget on this for today; if I can, will return with something actionable.

@swannodette
Copy link

@cemerick yep your analysis correct, you run into the same issues when using core.async with core.match http://dev.clojure.org/jira/browse/ASYNC-40

@cemerick
Copy link
Collaborator Author

cemerick commented Apr 1, 2014

Ouch.

The function in question in my case is ~1200 lines when compiled. I can't seem to cut much out of it before the stackoverflow goes away. FWIW, the example you link to in that issue does compile given the Leiningen defaults under CLJS 2197, so simple fn body size isn't the constraint w.r.t. the stackoverflow.

@swannodette
Copy link

@cemerick yes that match is pretty simple but try putting something like the following in a go block:

(ns foo
  (:require-macros
    [clojure.core.match :as m]
    [cljs.core.match.macros :refer [match match*]])
  (:require [cljs.core.match]))

(let [n [:black [:red [:red 1 2 3] 3 4] 5 6]]
  (time
    (dotimes [_ 1e6]
      (match* [n]
        [(:or [:black [:red [:red a x b] y c] z d]
              [:black [:red a x [:red b y c]] z d]
              [:black a x [:red [:red b y c] z d]]
              [:black a x [:red b y [:red c z d]]])] :balance
        :else :valid))))

@cemerick
Copy link
Collaborator Author

cemerick commented Apr 1, 2014

I shall, next opportunity I have. But, there are two separate issues here, correct?

  1. Whether the ClojureScript compiler can reliably run to completion
  2. Whether the code it emits is efficient

We can definitely achieve (1) (with more JVM-level resources if necessary as a stopgap, and with constant-space compiler refactorings long-term).

(2) seems much more challenging to me, given the generality of the objectives of things like go. I don't know that something like :no-transform metadata (mentioned as a solution in ASYNC-40) will be very satisfying, given the explicit support that it requires among many different parties, and its narrow applicability (i.e. can't be put on bodies that may contain arbitrary computation, but perhaps often do not).

@swannodette
Copy link

@cemerick the issue has very little to do with 1 or 2. The go macro should be smarter or provide an escape hatch.

@cemerick
Copy link
Collaborator Author

cemerick commented Apr 1, 2014

My only point with (1) was that no macro (short of emitting an infinite seq as part of its expansion, perhaps, though that also can be thresholded so as to provide a useful response) should cause such a compiler failure.

@swannodette
Copy link

@cemerick I don't really see how that is possible with the current recursive descent compiler. Definitely not something in scope in the near future. To be extra clear far as I know sufficiently complex Clojure source will stack overflow even the standard JVM compiler - that's why I posted the previous example from core.match.

@cemerick cemerick removed this from the 1.0.4 milestone Dec 15, 2014
@cemerick
Copy link
Collaborator Author

Not cljsbuild-related, track http://dev.clojure.org/jira/browse/ASYNC-40. Thanks, @swannodette.

rnewman added a commit to ncalexan/datomish-old that referenced this issue Aug 19, 2016
This allows CLJS compilation to complete for complex go-pair forms.

See <emezeske/lein-cljsbuild#303> for more details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants