-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow memory leak in Dotty's parser #1584
Comments
For completeness, let me quote @odersky's response here too:
|
@Blaisorblade the reason I bring up https://github.com/fommil/class-monkey is because it is not possible to "discard" a classloader that uses scalac at present because of several locations where files are not closed correctly (fundamentally |
@fommil Right now dotty relies on |
@fommil OK, I see that's relevant. Is that the only obstacle or is there more? |
that's probably the biggest memory leak that I'm aware of with |
Sounds like you'd like to "isolate" Scalac more. I feared it would be hard.* If we do that, there are fewer reasons to fix the slow leak discussed in this issue. *EDIT: I mean: I feared that encapsulating the compiler from outside would be an annoyance for clients, in which case on Dotty would have to fix all memory leaks instead of relying on "compiler encapsulation". Relying on encapsulation would reduce non-functional requirements on Dotty (a bit à la Erlang). |
I would like to avoid leaking memory, even if it is only a little bit. It simply makes up-scaling of the compilation process more difficult. We already have problems with scalac because some resources are not correctly handled/freed. Especially code generation in our test suite is a pain because we have to ensure uniqueness of types/symbols manually. Resetting an existing scalac instance is simply not possible. Btw, we generate thousands of compilation units in tests, therefore even a small leak may have a larger impact. Because of complexity overhead, we often don't reuse existing instances of scalac but recreate them. This adds some startup overhead though. However, if dotc is able to minimize the startup overhead, it can leak as much memory as it wants because then we can simply trash it whenever we want. But I'm not sure to which extend it is possible to minimize the startup overhead. The largest problem is that for large projects a lot of dependencies may need to be recompiled. |
@sschaef Thanks—I don't claim to have a full overview of Dotty, but I'd guess this is better fixed earlier than later. But @odersky argued fixing this is not easy, so a few questions:
This leak is from interning names—Dotty stores forever (in an object field) all the unique names it ever saw, so reusing names won't cause leaks.
That won't help with the leak under discussion, since the leaked memory is stored in an object field. That's why I talk about discarding classloaders—unloading a class is the only way to discard static members. (Storing this table in |
Discarding state by throwing away classloaders is problematic for performance, as you also throw away all the JITted code. |
A pragmatic solution here would be to add a @Blaisorblade What did you mean by:
For large multi-project builds, one problem is that you end up with lots of duplication between the name tables of the compilers for each sub-project. Name table sharing would be worthwhile in those cases. This could be still be achieved without a global name table by passing in an existing name table when creating the root context for a new compiler. |
👍 💯
I just referred (badly?) to Odersky's rebuttal of the proposal in the parent thread:
|
Forgot: Do we probably want a more generic API for clients, in case it later needs to clear other object fields? |
Interning of symbol tables sounds very reasonable to me. We would never parse an entire project in ensime, just the sources the user is interested in. Well pointed out on JIT. If we can avoid having to use a classloader to use dottypc that would be good. If it used classloaders internally to just load resources, that's fine but please remember to close them after. Somewhat related is the functionality provided by scalap. With scalap we really do scan the entire classpath, every classfile, and if that were to introduce a memory leak we'd be in huge trouble. Will scalap continue our will there be a replacement for it? We only use a small part of scalap and it might be easier if you just wrote something from scratch. |
scalap can't be reused since pickling is completely different in dotty, you can get similar information (and much more) from unpickling tasty, though there's no public API to do this currently. |
Ok, good we caught this early, me definitely need that functionality to build or index. Should I create a ticket or will you? Happy to do a walkthrough of our usecase. |
I think it's possible to move the name table to the root context of a On Thu, Oct 13, 2016 at 7:12 AM, Sam Halliday notifications@github.com
Prof. Martin Odersky |
A walkthrough would be interesting. Note that when I say "public API" I really mean "external API that we plan to support", but there's no external API for anything in dotty yet (except the very limited https://github.com/lampepfl/dotty/tree/master/interfaces), for now using the compiler internal APIs is fine. There's no good usage examples now but I'm currently experimenting with using tasty to provide some IDE features such as jump to definition and (maybe) renaming, I'll keep you up to date. |
@odersky for ensime it might be useful to have a shared name table. Consider the case where we are spawning new presentationcompiler instances fairly regularly (e.g. one per sub-project, and for the project's tests, and maybe fresh PCs for refactoring operations). |
@smarter what do you mean by "jump to definition"? you'd need to understand where source jars are stored to do this, and also java. It seems like this is an area where what core dotty can do should be limited to avoid duplicated efforts in ensime. |
@fommil I'm ignoring java and outside projects, we store the source file path in Tasty so we should have the information we need, I'm not trying to duplicate ensime but rather to evaluate how various parts of dotty (including tasty, which was not explicitly designed for interactive use) can be used in IDEs. I'll have code and a write-up in a few days. |
@smarter I'm not sure you have everything you need in there, you won't know the location of the jar or the source directory root. If what you are providing is |
also, this could be awesome for debugging if this information is available at runtime by converting source/line to specific blocks of code! e.g. imagine being able to "step into lambda" |
I agree that it should be relative, not sure if it is right now, which is exactly why I'm testing these features :). In any case this is very much off-topic, we can chat more on the dotty gitter if you want. |
For anybody caring: the static table is still there under |
Splitting out of #1527 to give a clearer summary.
Dotty interns identifiers and never releases them, even if you drop the compiler instance, because the hashtable used for interning is stored in a static field. While this only leaks memory slowly, I'm not sure that's OK (though @odersky is not convinced this is a real problem). So if one wants to avoid this leak in Ensime/ScalaIDE, it seems one would have to load Dotty in a separate classloader.
I'm trying to anticipate whether this is indeed a problem, so pinging @fommil and @sschaef:
The text was updated successfully, but these errors were encountered: