-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose type dependencies (call graph) for accurate incremental transpiling #99
Comments
I created a separate visitor, that produces cache of all graph information, and then built an incremental reference manager around this. I need just a single callback within J2cL to run and maintain this. my first impl, was putting the code inside of J2CL, i'm now separating this into it's own project and will submit a PR to the small change I need to J2cL for this ti work. I was working on the to provide incremental J2CL, that could work with any IDE. Once the code is separated, I'll then integrate this into maven. ReferenceManager: |
With J2CL, even compiling whole JRE takes less than 3 secs on a warmed up worker. I would assume Gradle has parallel concepts to what we have in Bazel. So is incremental transpiling really necessary in practice for Gradle? |
I think 3 seconds for transpiling + many more seconds for Closure is too long for a usable "dev mode". Gradle does not have ijar/hjar but instead fingerprints the "incoming" classpath of tasks (it knows how to fingerprint a "compile classpath" differently from a "runtime classpath", only taking the ABI into account in the first case), but I'd expect that this leads to the same results eventually 🤞 I have a (non incremental) prototype in Gradle that, when applied to the I need to try with a larger project, and possibly optimize performance of the tasks themselves, but given that "targets" are much bigger in non-Bazel projects (fwiw, my current GWT project has 90.4k sloc in 1275 Java files in the same "target" –i.e. excluding the libraries that it uses, some of them "internal" to our project too–; and SDM still is very usable!), and given that Gradle thought worth it to work on incremental Java compilation despite the awful complexity particularly including annotation processing, then I would assume that this would greatly help for J2CL as well. Let's gather some more numbers. |
3 seconds is for change in JRE itself; which is one of the worstcases.
Yeah let's gather some numbers and we can decide on prioritization of this. |
And thanks for looking into J2CL Gradle!! |
AFAICT, the JRE is 26.5k sloc in 265 files, that's small compared to the 90.4k sloc in 1275 files I'm working with 😉 (and that's not counting all the generated code, as it makes heavy use of UiBinder, the Editor framework, and RequestFactory, and of course ClientBundle and i18n).
👍 |
@rluble heads up Ok got it. For starters; to cover output to source mapping, we could easily have a flag so that the generated source map can point to the original source. I think that's generally useful option. But list of dependencies of a file though is a more of a unique request. Curious if it is not easy to extract that information from the Java compilation that Gradle is doing? BTW, as side notes, I would like to get the parallel stripping + javac compilation out of the equation and rely completely on Turbine for that. We never need the byte code other than the ijars/hjars. We can keep separate stripper tool but wanted to share that if it effects your plans. |
Far from being done, but code is now available at https://github.com/tbroyer/gradle-j2cl-plugin |
That'd at least help us start experimenting 👍
Should be relatively easy to extract using ASM.
This means that the J2clTranspile task would need both the
I had long been floating the idea of stripping at the bytecode level using ASM, in addition to stripping at the source level with the GwtIncompatibleStripper. If the stripper is integrated into the transpiler, we could then just remove it, and continue to strip the bytecode with ASM in parallel. |
I was assuming that since Gradle Java plug producing this information, maybe we can simply access it. Going through ASM might be a an overkill if used only for this purpose.
I'm not sure how things are with rules_jvm_external but we will very likely keep following whataver the Bazel Java solutions come up with. |
For reference (I thought I already posted it, but can't find it), the way Gradle does it for JavaC is here: https://github.com/gradle/gradle/blob/v6.6.1/subprojects/language-java/src/main/java/org/gradle/api/internal/tasks/compile/incremental/ (and a bit in the parent package)
And it's more complex than J2Cl as the process also has to track generated files by annotation processors. This (more or less) matches what I described above for J2Cl:
For reference, the meat of the incremental compiler, that makes use of this data, is |
I have added a TypeGraphStore to my fork. As well as dumping the graph info, it records if an update to a class may have a cascading impact - such js class and method renames. It has a single touch point (see line 151)in J2clTranspiler, as well as the options config to enable it. |
Thanks for sharing @mdproctor @tbroyer I think I didn't get a direct response to my earlier question. Assuming you use Gradle to do java compilation, don't you already have access to this generated mapping file? |
It might be possible to get access to this information (I haven't tried), but this is internal to Gradle and could change at any time. Some (or all) of it also won't be available (AFAICT) if the compilation output comes from a build-cache. It also, as you said, assumes that the exact same code has already been compiled by JavaC. Tasks being building blocks, things can be configured slightly differently, breaking the incremental processing with no (easy) way to detect it. In a shared library, one could, for example, only compile (javac) its original sources, and not the ones stripped from (fwiw, I have that ASM-based bytecode-stripper working already; not sure if I'll actually use it but I wanted to learn ASM 😉) |
I have submitted a PR of my work to vertispan's repository, Vertispan#12 I'm not expecting this to be merged yet, but it's a good time to circulate for feedback. I'm not sure we need this included in J2CL, but it might be nice to have a hook so a custom use provided visitor can be run. I was unable to use the LibraryInfo as its information is not complete and anything visible to JS is not exported as it's more aimed at pruning - but it works in a similar way. I also created my on simple, CVS style, format to read/write the information - because it's compact and fast - much smaller than the json. For incremental refreshes, we want to cut every bit of time we can. For this approach to work it expects the IDE to ensure all compilation errors have been resolved, before being passed to j2cl. Although it gracefully handles when that is not the case. While a subset of .java files are passed for transpiration, it uses .class output from the other classes to ensure everything resolves correctly. We have found this to work fine, with our j2cl Quake port. The code works in two phases. The first phase takes the incoming changeset (as defined by timestamp changes since last iteration) and uses the graph information, if any, to build the expanded list of files that need to be retranspiled. This first phase is done before calling j2cl, such we do in our maven plugin already. The second phase is after the j2cl produces it's compilation units, and it outputs the new graph information. Note, as it has all the graph information from the previous call, so even if a subset of java files are passed, it is able to ensure a full graph write out on each iteration. The visitor builds up a list of all the members, and builds a callee/caller graph. "Impact" tracking is added for a member that calls another member who has as different JS name specified in the annotation - as it's possible to change the js name, and the IDE will not trigger refactor in the caller and thus no timestamp change. This way it is ensured the caller is still added to the transpiration changeset. As other issue are identified, they can be added to impact tracking too. I have added a unit test, to check the main types of issues. We have used this with the j2cl quake port in our maven toolchain and iterations went from 7s to 1.5s for each iteration. |
Could we customize library info with a flag so that you can use the information provided by it instead of introducing completely independent concept? If it doesn't require heavy customization, that would be a better option. And is the output we would generate going to be useful for both Gradle and Maven incremental compilation? i.e. can they use same thing? And the final question as a side discussion: |
I haven't looked at the PR yet but there's no a priori reason it wouldn't work for Gradle or any other build tool.
Generally speaking Maven does nothing, it's as dumb as Ant or Make; developers generally rely on their IDE for incremental compilation (hoping it does the same thing 🤷). |
It needs to dump the full graph information. All interfaces implemented, all classes extended, all methods and properties referenced. We also need the annotation meta data, that might impact the transpiration graph - i.e. when jsinterop changes the used js name for a class, field or method. Right now a lot of the information is removed via: if (isJsAccessible(referencedType)) { If we can get all that information dumped, that would be great. Also maybe we could configure protobuf so the output could be pure json or a format to a zipped binary? the later would ensure we load this information as quickly as possible, so that each refresh is faster. The other tricky part is how to keep this data whole-whole and up to date, yet handle on partial incremental transpilations - where only a subset of .java files are passed for translation at any given time - except the first time, when it must receive all. If you look at the code I read in the previous iteration of the graph that was saved to disk. I then remove just the types being updated via transpiration and then re-insert them back into the graph and save the graph as a whole. |
This could be done by the plugins as a post-j2cl step, right? (with the text/CVS-style file being plugin-specific then: read the LibraryInfo and merge it into the cache file) |
Yes I believe it should be possible to merge a more recent partial graph over the whole world version. It would mean potentially two files on disk. The whole world and the partial world output from j2cl. |
|
AFAIK, library info is already emitted in binary format here. The json text is just for debug and used only in golden file tests. |
Thanks @mdproctor. I meant to reply to @tbroyer response and follow up on with my summarization of the options to make sure we are on the same page but I couldn't come back to it yet. And as Roberto pointed, yes we use proto binary, it is a quite compact and platform independent format. |
@gkdn if you are able to make the changes in a branch, I can try and port my test suite to it: I did look myself at doing the changes, they seem quite minimal, but not sure I'd do it in the way you'd want. don't forget we will need the qualified java and qualified js name, to determine if a jsinterop annotation changed those. |
Looking over the LibraryInfo. I can see it stores the qualified js name, but not the qualified java name. Also we do not have any of the JsInfo. The aim is to be able to determine if any Java class, property or method has it's namespace or type renamed when outputting to javascript. Not just that "getJsInfo().getJsName()" exists, as it could be specified but the same name, so the it will not result in an actual different name. |
Correct me if I'm wrong but what's needed is the Java source file name, not much the Java type name. What's needed is a way to compute which other file needs to be j2cl'd when a given file is modified. |
"What's needed is a way to compute which other file needs to be j2cl'd when a given file is modified." |
Why do you need JsInfo, qualified js name or changes to namespace in JsType for incremental compile? |
Take the example below, if I refactor the A java name, then B will be changed too, both datetime stamp are updated and thus part of the submitted changeset. So I rely on datetime stamp to build the submitted changeset, rather than scan the entire call graphs. This works fine. But if I change JsType name on A, but do not change the java class name of A, then the generated JS will be stale, as the IDE will not performance any updates on B - so this must be detected and B added to the ChangeSet. So in my code i was checking if the JsInfo was doing any renaming, and if it was different from the previous time this class was visited.
|
IIUC what @tbroyer was asking for; we will give you the list of things that depend on However if you are saying that |
(finally had the time to send the reply I promised earlier) Thanks @tbroyer, you are a gem; a walking Encyclopédie of build tools 🙂 Ok IIUC, we have following options:
Option 1 is not very complicated; as you pointed it can be done with a tool that uses ASM or JDT. Option 2 doesn't make much sense since using J2CL is not valuable for this purpose. Option 3 is ok as I said earlier. Option 4 is probably overkill at the moment. Maybe we can eventually add it by moving some of your gradle code. |
And for option 3; we would need to output something like:
Since this is simple and mostly text anyway, we can go with a simple text file emitted per source instead of using a binary proto. So the final outputs will look like:
where
We don't have the bandwidth for this but it should be trivial to add. |
"Since this is simple and mostly text anyway, we can go with a simple text file emitted per source instead of using a binary proto." |
You need a lot more than jsinfo annotation for that. And that is really hard to pull off correctly in a minimal way, e.g. base class change may result in new bridges, so class And does it really matter in practice? (considering transpilation time vs. bundling+downloading+parsing) Regardless, I think we should just start with something simple here. |
If we retranspile all direct (not transitive) callers (no optimisations), from what I can see, isn't that information in the LibrayrInfo already? |
LibraryInfo is a call-graph. Reverse of it doesn't necessarily give you what to compile when something changes since new APIs can add new edges to call graph. Look following just for an example:
Then introduce So if there is an API change, it is best and also easy to compile all the deps of the file. If you know if there is only a change in body, you don't need to re-compile anything other than that class - that scenario will be the majority of the user changes anyway during an edit-refresh cycle... |
ok works for me, happy to start with something simple and robust and expand from there if needed. As you say, we can still capture signature changes ourselves, and avoid retranspilation if they do not occur. |
@gkdn I compiled your sample code with JavaC, and it turns out Java puts
Could the call graph then contain Regarding the rest of the discussion: you cannot know what part of |
Specifically for LibraryInfo; RTA works based on normalized code; so generally speaking it is not a good idea to provide more raw data since it would require having more logic and that's something we don't want. Also generating data that is not normalized is not a very practical option for us since we don't track the origin. That also applies for the proposal that I made in #99 (comment).
Are you talking about minimizing recompiles by detecting that only "method body" has changed right?
If A is changed, you would need to mark B as dirty as well and that would make also Zoo dirty, right? |
@mdproctor @tbroyer Could you maybe create a document that extracts the different re-compile scenarios (including corner cases) and your proposal to handle them (including the expected output from J2CL) so we can comment on and help you out? |
I have a POC working and integrated with the maven plugin, that seems to work. The commit can be seen here, please ignore the vertispan maven stuff. It produces a build.map per class (and per inner class), using a format similar to you described:
To make the implementation, for now, I avoided trying to change existing code. I cloned LibraryInfoBuilder and removed pruning and added the code to dump the build-up. I updated the generator class to use this if -buildMap is present. I had to make another addition to the proto to record both exist target enclosing type (which it already does) and also the instance type. It needs the latter, to make sure any class impacted by a change in the hierarchy due to a method override is picked up - when a class is changed, we iterate it's descendants and add each of those to the change set. Once people are happy with this approach and the format in the build.map file, we can look code and how to best incorporate it. Do we introduce the behaviour directly into LibraryInfoBuilder, or do we keep the two separate but work for more code re-use. |
Sorry for the late reply. Yes it is good idea to iterate on a separate pass for now and we can decide later on how to best incorporate the output. Why are the hierarchy and interfaces explicitly defined instead of simply added as a dependency? In my proposal; I assumed the flow will be; This process will be optimized on the plugin side by diffing the structure/signature of classes and not do the dependency travel if they are stable since most changes won't affect them. |
Is your feature request related to a problem? Please describe.
For accurate incremental transpiling, external tooling (Maven, Gradle, etc.) need to determine which files to re-transpile when a given file changes.
The mapping from source file to generated files is needed too (naming conventions are not enough, because it's not unusual for a Java file to contain multiple types, and can even contain several top-level types; and source files aren't even required to follow the common naming convention based on the package and type names) to accurately cleanup the output directory when a source file is deleted.
J2CL also copies source files (
.java
and.js
) to the output, possibly relocating them, so the mapping from source path to output path is needed as well.Even better if it could also map to JARs or directories in the
-classpath
for external dependencies, such that if the JAR (or class in the directory) changes the build tool could re-process all files that depend on it (and possibly on all following path entries in the classpath, in case class is now shadowed, or a class that was shadowed no longer is; if it could map to the actual class file, this could be made even more precise, by detecting those shadowing cases). Otherwise, each change to the classpath would mean reprocessing all files. Gradle, for instance, does such processing for its incrementaljavac
.For example, given those two source files:
and
If
src/p/B.java
is modified, then we'll want to reprocess bothsrc/p/B.java
andsrc/p/A.java
becausep.B
would have had its API modified in a way that would changep.A
's output (e.g. here adding aB(int)
constructor overload).Processing those 2 files will result in 8 files being generated or copied:
out/p/A.java
(copied from the sources)out/p/A.java.js
out/p/A.impl.java.js
out/p/A.js.map
out/p/B.java
(copied from the sources)out/p/B.java.js
out/p/B.impl.java.js
out/p/B.js.map
If
src/p/A.java
is deleted, we'll want to delete all theout/p/A.*
files.Now if a
src/p/A.native.js
is created, thensrc/p/A.java
would have to be reprocessed. And then again ifsrc/p/A.native.js
is deleted. But because J2CL uses a naming convention here, this does not need to appear in the mapping (correct me if I'm wrong).If
src/p/B.java
is modified to add an inner class, it would generate 3 new files (out/p/B$Inner.java.js
,out/p/B$Inner.impl.java.js
, andout/p/B$Inner.js.map
). Ifsrc/p/B.java
is then deleted, then all 7 files need to be deleted, andsrc/p/A.java
be reprocessed (which would fail, unless it was also modified to remove its dependency onp.B
).Describe the solution you'd like
Define a stable file format that
J2clCommandLineRunner
could output for use by external (non-Bazel) tooling.It needs to contain 2 things:
src/p/A.java
depends onsrc/p/B.java
; this could use types as an indirection, e.g. typep.A
comes fromsrc/p/A.java
and references typep.B
that comes fromsrc/p/B.java
, andfoo.Bar
that comes fromjar:foo.jar!/foo/Bar.class
).java
and a.native.js
) for each output fileIn an incremental use of the transpiler, the build tool would have to merge the generated mapping file with the previously known mapping (and removing entries from that global mapping when a file is deleted).
Describe alternatives you've considered
J2CL outputs a source map file (
.js.map
) for each generated file (pair of.java.js
and.impl.java.js
), containing the path of the source; but that references the source that J2CL copied to the output, not the actual path to the source (as can be seen inbazel-bin/third_party/jbox2d.js.zip
vsbazel-bin/third_party/libjbox2d-src.jar
after abazel build //third_party:jbox2d.js.zip
: the.js.zip
contains ajava/lang/StrictMath.js.map
referencing, using a relative path,java/lang/StrictMath.java
, but the actual source in the-src.jar
wasexternal/org_jbox2d/src/main/java/org/jbox2d/gwtemul/java/lang/StrictMath.java
), because it's targeted at tools that need to access those.java
files.Additional context
J2CL internally already has all the information; it populates a
LibraryInfo
protobuf that it can serialize to a file (for later use by RTA) but the flag to do so is not exposed toJ2clCommandLineRunner
(probably to keep theLibraryInfo
as an implementation detail of J2CL). Just like the source map though, the library_info does not contain the actual source path either (modifyingthird_party/BUILD
to passreadable_library_info = True
to//third_party:jbox2d
generates alibrary_info_debug.json
in the.js.zip
but it does not contain the fullsrc/main/java/org/jbox2d/gwtemul/java/lang/StrictMath.java
path)The Kythe indexing metadata includes the information though (I modified
build_defs/internal_do_not_use/j2cl_common.bzl
to unconditionally pass-generatekytheindexingmetadata
), so J2CL has all the needed information to generate such a mapping file.The text was updated successfully, but these errors were encountered: