New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automated experiments on the compiler pipeline #3973
Comments
I had a look at the duplicate pairs of For reference, here's how the stages are set up currently: k/kernel/src/main/java/org/kframework/backend/kore/KoreBackend.java Lines 248 to 259 in 1e4aa2b
And here's after I moved what I could:
The regression tests still pass with this setup. The definitions in the profiling tests also kompile without issue, and identical |
I've compared the |
Here's the script I've been using for #!/usr/bin/bash
set -exuo pipefail
PWD="$(pwd)"
K_BIN="${PWD}/k-distribution/bin"
LOG_DIR="${PWD}/bisect-log"
COMMIT=$(git rev-parse --short HEAD)
trap "${K_BIN}/stop-kserver || true" INT TERM EXIT
mkdir -p "${LOG_DIR}"
${K_BIN}/stop-kserver || true # make sure no kserver is running
mvn package -DskipTests -Dproject.build.type=Debug -T1C
${K_BIN}/kserver &> /dev/null &
cd k-distribution/tests/regression-new
make clean
make -j8 -O |& tee ${LOG_DIR}/${COMMIT}.out |
We should carry on this line of investigation; we could do the following:
|
#3973 This factors a few compiler pipeline stages out from between a pair of duplicate stages. If we're able to eliminate the duplicate stages, this change will help narrow down what changes need to be made to do that. --------- Co-authored-by: rv-jenkins <admin@runtimeverification.com>
Here's another experiment I'm working on. The idea is to take the stage at the end of the pipeline and pull it as far as possible to the front. Here's a snippet that replaces the end of the
The stages are held in a java list, and composed together with a I have more stages to check, but I've found so far that
One thing to note is that this only tests for breakages in the pipeline during compilation, further testing is needed in the regression tests and downstream for any changes that we might want to make. |
This is very cool!
Perhaps open a PR that moves these to the front of the list explicitly, and adds a comment noting that they have no dependencies on later passes and are designed to apply directly to user code. If there's any low-hanging fruit to be done here in terms of code cleanup, refactoring etc. then it might be a good time to do so.
Indeed - as we said in the K meeting, we don't necessarily have an a priori assumption that it will be correct to move everything around. If there's something with non-obvious dependencies like this one, then it's perfectly fine to leave it out and come back to refactoring it in the future if needs be. |
It would be nice if we could do these experiments without having ot rebuild K. That would require making it so that the compiler passes we choose ot include can be set on the CLI instead of directly hardcoded in this file. Would it be possible to attach a string identifier to each compiler pass, and then build up this list of passes dynamically instead of statically? That would also allow for much easier downstream testing of moving the compiler passes around and figuring out their dependencies. Additionally, we could then also have tests that only run a part of the compiler pipeline (perhaps loading the JSON KAST, running a compiler pipeline step, then dumping the resulting JSON Kast), which would be a much more tailored test than the end-to-end integration tests we currently have. |
A couple of things I noticed when looking through the code that builds the compilation pipeline:
I think the LLVM optimisation pipeline infrastructure is a pretty good place to look for design inspiration. Some features there that are worth looking at:
|
Some more information from investigation with @gtrepta: Overall kompile pipeline (in
|
We could pull applying sort synonyms and implicit IO imports out of the parseDefinition into its own explicit step, and then call that step directly. Then we are breaking the entire kompiler pipeline into major "epochs", which could be made more fine-grained over time (as opposed to current approach, where we were thinking of starting with the pipeline as fine as the individual steps and understanding the steps). Proposed rough sketch
Then we slowly break each chunk of the kore steps out into other atomic units taht must be run together. This is just an idea of how to attach teh problem in the other direction. Basically, here, we know that the "dependency/use" chain is a straight line, not a dag, because of how the compiler is currently structured. At the very least, we could pull out the |
As suggested by @ehildenb, I'll also work on recording all the attribute read-write dependencies between passes. This will help to clean up the attribute usages in general and give us a subset of the overall pass dependency DAG. |
#3973 This defers adding K-IO to the main module from parsing to when the coverage instrumentation gets generated. It also makes GenerateCoverage's methods static, and moves the check for the coverage option outside of it, where generating coverage just turns into a no-op if the flag isn't there. There still needs to be a check in DefinitionParsing on the coverage flag to keep the K-IO module around at the step where unused modules get trimmed from the definition. --------- Co-authored-by: rv-jenkins <admin@runtimeverification.com>
|
Next steps, after discussing with @ehildenb:
This will allow faster experimentation of different orderings of the transformations to find dependencies and will make it easier to do these sorts of experiments on downstream semantics.
This along with the previous option allows us to generate isolated tests for each compiler pass with input/output jsons. We can also get the input/output of each individual step in the pipeline and see the actual changes it's making or if it isn't even making any changes, depending on what's in the definition. This could also be used to help integrate pyk kompile with the java frontend, as pyk can do the outer parsing, and then pass that to kompile where inner parsing and pipeline transformations can be done. |
Here's a spreadsheet of every regression test, each pipeline stage, and whether that stage made a change to the definition (1) or didn't make any change (0). https://docs.google.com/spreadsheets/d/1Jq8XYdNpzTv9XPXQZx1cOTRKBNWsq1fmz3HH4Ipv0Cs/edit?usp=sharing |
#3973 Adds an option `--kore-backend-steps` to `kompile` which takes a list of compilation steps to run in the kore backend pipeline. Omitting this option is equivalent to `resolveComm,resolveIO,resolveFun,resolveFunctionWithConfig,resolveStrict,resolveAnonVars,resolveContexts,numberSentences1,resolveHeatCoolAttribute,resolveSemanticCasts,subsortKItem1,constantFolding,propagateMacroToRules,guardOrs,resolveFreshConfigConstants,generateSortPredicateSyntax1,generateSortProjections1,expandMacros,addImplicitComputationCell,resolveFreshConstants,generateSortPredicateSyntax2,generateSortProjections2,checkSimplificationRules,subsortKItem2,addStrategyCellToRules,addStrategyRuleToMainModule,concretizeCells,genCoverage,addSemanticsModule,resolveConfigVar,addCoolLikeAtt,markExtraConcreteRules,removeAnywhereRules,generateSortPredicateRules,numberSentences2` --------- Co-authored-by: rv-jenkins <admin@runtimeverification.com>
Here's another spreadsheet. Each column represents a stage in the kore backend pipeline that was dropped from compilation, and each row is a test in the regression suite with its test result after the stage was dropped. Each cell says whether the test passed (PASS), failed (FAIL), or encountered an error during compilation (KOMPILE). Note that https://docs.google.com/spreadsheets/d/13jDHr49_EnIOSnhlKzbvpm8FXN620itrZBywmLuKRJs/edit?usp=sharing |
Nice! So from that second spreadsheet, any fully-green columns indicate that the compiler pass perhaps isn't being exercised properly by the regression test suite? Even if the internal data structures get changed, we still aren't observing any actual behavioural differences at the level of the test suite. It would be interesting to see if we can write a test for each green column that breaks if the pass is removed; doing so would also be a good way of understanding more concretely at the K source level what the passes affect. |
That's my interpretation as well. I see:
|
These, at least, are planned for removal |
I've looked at It wraps |
|
@dwightguth doesn't see why
|
This brings up something I'm noticing about invariants. AFAIK every sort being a subsort of KItem is supposed to be an invariant, but we're enforcing this by explicitly creating the syntax declarations. This creates the phenomenon of the invariant being broken by a stage, staying broken for some intermediate stages, and then being fixed by another stage. This seems less than ideal from a maintainability standpoint, as one would want to rely on every invariant that they know about a K definition, but instead need to keep track of whether or not they're in a place where they could actually use that invariant. Keeping these invariants from being broken, either with assertion checks, or making some of them exist implicitly could help out here. |
Here's a list of dependencies between compilation passes that I've found. I will continue to update it as I find them.
|
I've added a second sheet to the spreadsheet for results from dropping pipeline steps. This one covers the regression tests that expect a failure with an error message (my methodology earlier couldn't cover these tests). If a cell says
|
The compiler pipeline in the frontend has grown gradually over the lifetime of the project, and the purpose for each stage and their dependencies on each other has become unclear. Some stages have little to no documentation, and are possibly unneeded or more complex than they need to be.
An investigation into the pipeline by poking at each of these stages, removing them or moving them around and seeing if it breaks any tests and why those tests break can shed some light onto the pipeline.
One technique for doing this can be with git bisect. A sequence of commits on top of
develop
that each make a single, small change to the pipeline can be fed togit bisect run
with a script that rebuilds the frontend and runs the tests.git bisect
will then find the first commit that breaks something, giving us changes before it that can be made, and a breaking change that we can investigate to help fill in missing documentation or make a refactoring.The text was updated successfully, but these errors were encountered: