Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated experiments on the compiler pipeline #3973

Open
gtrepta opened this issue Feb 9, 2024 · 25 comments
Open

Automated experiments on the compiler pipeline #3973

gtrepta opened this issue Feb 9, 2024 · 25 comments
Assignees

Comments

@gtrepta
Copy link
Contributor

gtrepta commented Feb 9, 2024

The compiler pipeline in the frontend has grown gradually over the lifetime of the project, and the purpose for each stage and their dependencies on each other has become unclear. Some stages have little to no documentation, and are possibly unneeded or more complex than they need to be.

An investigation into the pipeline by poking at each of these stages, removing them or moving them around and seeing if it breaks any tests and why those tests break can shed some light onto the pipeline.

One technique for doing this can be with git bisect. A sequence of commits on top of develop that each make a single, small change to the pipeline can be fed to git bisect run with a script that rebuilds the frontend and runs the tests. git bisect will then find the first commit that breaks something, giving us changes before it that can be made, and a breaking change that we can investigate to help fill in missing documentation or make a refactoring.

@gtrepta
Copy link
Contributor Author

gtrepta commented Feb 9, 2024

I had a look at the duplicate pairs of generateSortPredicateSyntax and generateSortProjections in the pipeline, seeing how many stages I could remove between them.

For reference, here's how the stages are set up currently:

.andThen(generateSortPredicateSyntax)
.andThen(generateSortProjections)
.andThen(constantFolding)
.andThen(propagateMacroToRules)
.andThen(expandMacros)
.andThen(checkSimplificationRules)
.andThen(guardOrs)
.andThen(AddImplicitComputationCell::transformDefinition)
.andThen(resolveFreshConfigConstants)
.andThen(resolveFreshConstants)
.andThen(generateSortPredicateSyntax)
.andThen(generateSortProjections)

And here's after I moved what I could:

            .andThen(constantFolding)
            .andThen(propagateMacroToRules)
            .andThen(guardOrs)
            .andThen(resolveFreshConfigConstants)
            .andThen(generateSortPredicateSyntax)
            .andThen(generateSortProjections)
            .andThen(expandMacros)                                    //
            .andThen(AddImplicitComputationCell::transformDefinition) // Only these three remain
            .andThen(resolveFreshConstants)                           //
            .andThen(generateSortPredicateSyntax)
            .andThen(generateSortProjections)
            .andThen(checkSimplificationRules)

The regression tests still pass with this setup. The definitions in the profiling tests also kompile without issue, and identical definition.kore files get generated as on develop. So it feels pretty safe to make a PR, but I'll try some more testing downstream. The symbolic backends aren't exercised very much here.

@gtrepta
Copy link
Contributor Author

gtrepta commented Feb 12, 2024

I've compared the definition.kore files generated on evm-semantics and wasm-semantics and they're identical. I also compared all of the definitions in the regression tests, and they differ, but it only appears to be the ordering and numbering of some sentences. I feel confident enough to open a PR for this, and leave the decision to merge it in up for review.

@gtrepta
Copy link
Contributor Author

gtrepta commented Feb 12, 2024

Here's the script I've been using for git bisect run to build and run tests:

#!/usr/bin/bash

set -exuo pipefail

PWD="$(pwd)"
K_BIN="${PWD}/k-distribution/bin"
LOG_DIR="${PWD}/bisect-log"
COMMIT=$(git rev-parse --short HEAD)

trap "${K_BIN}/stop-kserver || true" INT TERM EXIT
mkdir -p "${LOG_DIR}"
${K_BIN}/stop-kserver || true # make sure no kserver is running

mvn package -DskipTests -Dproject.build.type=Debug -T1C

${K_BIN}/kserver &> /dev/null &

cd k-distribution/tests/regression-new

make clean
make -j8 -O |& tee ${LOG_DIR}/${COMMIT}.out

@Baltoli
Copy link
Collaborator

Baltoli commented Feb 13, 2024

We should carry on this line of investigation; we could do the following:

  • Chunk up the pipeline and see what passes depend on each other implicitly
  • Check idempotency f(f(x)) == f(x)
  • More duplicate passes (number sentences, ...)

rv-jenkins added a commit that referenced this issue Feb 13, 2024
#3973 

This factors a few compiler pipeline stages out from between a pair of
duplicate stages. If we're able to eliminate the duplicate stages, this
change will help narrow down what changes need to be made to do that.

---------

Co-authored-by: rv-jenkins <admin@runtimeverification.com>
@gtrepta
Copy link
Contributor Author

gtrepta commented Feb 16, 2024

Here's another experiment I'm working on. The idea is to take the stage at the end of the pipeline and pull it as far as possible to the front. Here's a snippet that replaces the end of the steps method in KoreBackend:

    List<Function1<Definition, Definition>> defTransformerList = new ArrayList<>();
    defTransformerList.add(resolveComm);
    defTransformerList.add(resolveIO);
    defTransformerList.add(resolveFun);
    defTransformerList.add(resolveFunctionWithConfig);
    defTransformerList.add(resolveStrict);
    defTransformerList.add(resolveAnonVars);
    defTransformerList.add(d -> new ResolveContexts().resolve(d));
    defTransformerList.add(numberSentences);
    defTransformerList.add(resolveHeatCoolAttribute);
    defTransformerList.add(resolveSemanticCasts);
    defTransformerList.add(subsortKItem);
    defTransformerList.add(constantFolding);
    defTransformerList.add(propagateMacroToRules);
    defTransformerList.add(guardOrs);
    defTransformerList.add(resolveFreshConfigConstants);
    defTransformerList.add(generateSortPredicateSyntax);
    defTransformerList.add(generateSortProjections);
    defTransformerList.add(expandMacros);
    defTransformerList.add(AddImplicitComputationCell::transformDefinition);
    defTransformerList.add(resolveFreshConstants);
    defTransformerList.add(generateSortPredicateSyntax);
    defTransformerList.add(generateSortProjections);
    defTransformerList.add(checkSimplificationRules);
    defTransformerList.add(subsortKItem);
    defTransformerList.add(d -> new Strategy().addStrategyCellToRulesTransformer(d).apply(d));
    defTransformerList.add(
        d -> Strategy.addStrategyRuleToMainModule(d.mainModule().name()).apply(d));
    defTransformerList.add(d -> ConcretizeCells.transformDefinition(d));
    defTransformerList.add(genCoverage);
    defTransformerList.add(Kompile::addSemanticsModule);
    defTransformerList.add(resolveConfigVar);
    defTransformerList.add(addCoolLikeAtt);
    defTransformerList.add(markExtraConcreteRules);
    defTransformerList.add(removeAnywhereRules);
    defTransformerList.add(generateSortPredicateRules);
    defTransformerList.add(numberSentences);

    return def -> {
      int pos = defTransformerList.size() - 2;

      Definition result = null;

      while (pos > 0) {
        Function1<Definition, Definition> toMove = defTransformerList.remove(pos--);
        defTransformerList.add(pos, toMove);

        try {
          result = defTransformerList.stream().reduce(x -> x, (a, f) -> a.andThen(f)).apply(def);
        } catch (KEMException e) {
          throw e;
        }
      }

      return result;
    };
  }

The stages are held in a java list, and composed together with a reduce operation, and then ran on the input definition. But, this is done repeatedly, moving the stage in question up by one position in the list each time.

I have more stages to check, but I've found so far that removeAnywhereRules and markExtraConcreteRules have been able to be pulled all the way to the front of the pipeline without breaking it. I think these are only supposed to operate on user supplied rules, so it makes sense that they work fine before any transformations that add new rules.

addCoolLikeAtt can be pulled to the front, but I'm less sure that this would be correct. cool-like is meant to be added to rules that have the k cell, holding a k sequence with a variable at the top. It's an attribute that the llvm-backend matching tree generation uses to apply a special priority that optimizes the tree. Aside from cooling rules that use the cool attribute, I'm not sure what other kinds of generated rules can appear that would be changed by this pass.

One thing to note is that this only tests for breakages in the pipeline during compilation, further testing is needed in the regression tests and downstream for any changes that we might want to make.

@Baltoli
Copy link
Collaborator

Baltoli commented Feb 16, 2024

This is very cool!

I have more stages to check, but I've found so far that removeAnywhereRules and markExtraConcreteRules have been able to be pulled all the way to the front of the pipeline without breaking it. I think these are only supposed to operate on user supplied rules, so it makes sense that they work fine before any transformations that add new rules.

Perhaps open a PR that moves these to the front of the list explicitly, and adds a comment noting that they have no dependencies on later passes and are designed to apply directly to user code. If there's any low-hanging fruit to be done here in terms of code cleanup, refactoring etc. then it might be a good time to do so.

addCoolLikeAtt can be pulled to the front, but I'm less sure that this would be correct. cool-like is meant to be added to rules that have the k cell, holding a k sequence with a variable at the top. It's an attribute that the llvm-backend matching tree generation uses to apply a special priority that optimizes the tree. Aside from cooling rules that use the cool attribute, I'm not sure what other kinds of generated rules can appear that would be changed by this pass.

Indeed - as we said in the K meeting, we don't necessarily have an a priori assumption that it will be correct to move everything around. If there's something with non-obvious dependencies like this one, then it's perfectly fine to leave it out and come back to refactoring it in the future if needs be.

@ehildenb
Copy link
Member

ehildenb commented Feb 22, 2024

It would be nice if we could do these experiments without having ot rebuild K. That would require making it so that the compiler passes we choose ot include can be set on the CLI instead of directly hardcoded in this file.

Would it be possible to attach a string identifier to each compiler pass, and then build up this list of passes dynamically instead of statically? That would also allow for much easier downstream testing of moving the compiler passes around and figuring out their dependencies. Additionally, we could then also have tests that only run a part of the compiler pipeline (perhaps loading the JSON KAST, running a compiler pipeline step, then dumping the resulting JSON Kast), which would be a much more tailored test than the end-to-end integration tests we currently have.

@Baltoli
Copy link
Collaborator

Baltoli commented Feb 22, 2024

A couple of things I noticed when looking through the code that builds the compilation pipeline:

  • We build a lot of similar passes multiple times because the description string being passed to ModuleTransformer etc. breaks compositionality.
  • We have a lot of ad-hoc lambdas being built to close over the definition in question; this might be a common enough pattern that it's worth building into whatever the solution looks like.
  • This is a good opportunity to clear up some of our Scala code; we want to remove this all in the long term so it would be good for whatever the solution here is to be Java.

I think the LLVM optimisation pipeline infrastructure is a pretty good place to look for design inspiration. Some features there that are worth looking at:

  • Ergonomic lifting of narrow passes into wider ones; in LLVM a FunctionPass very directly becomes a ModulePass etc. This isn't the case for our current design.
  • Pipeline manager objects; this will admit Everett's suggestion to build a dynamic string of passes above better than the current .andThen(...).andThen(...) approach.
  • (eventually, we will need some more experiments to understand the dependencies properly) automatic scheduling of the pipeline based on some kind of invalidates / supplies / depends on relationship between passes.

@ehildenb
Copy link
Member

ehildenb commented Feb 22, 2024

Some more information from investigation with @gtrepta:

Overall kompile pipeline (in Kompile.java, method Kompile.run(...)), has 3 major steps:

  • parseDefinition: parses, and runs 2 very simple passes:
    • apply sort synonyms
    • adds implicit IO module import
  • checkDefinition: checks a bunch of structural things about the definition
  • pipeline.apply(...): runs KoreBackend.steps(), the bulk of the compiler pipeline. There is one check in this pipeline, which is checkSimplificationRules (which checks that all simplification rules are functional). This is because this check must be run after macro expansion.

@ehildenb
Copy link
Member

We could pull applying sort synonyms and implicit IO imports out of the parseDefinition into its own explicit step, and then call that step directly. Then we are breaking the entire kompiler pipeline into major "epochs", which could be made more fine-grained over time (as opposed to current approach, where we were thinking of starting with the pipeline as fine as the individual steps and understanding the steps).

Proposed rough sketch

:

definition = parseDefinition(...);  // Read the definition in
definition = preCheckPasses(...); // apply sort synonyms and implicit IO module import
checkDefinition(definition);
definition = everythingUpToExpandMacros(...);
definition = expandMacros(...);
definition = everythingUpToCheckSimplificationRules(...);
checkSimplificationRule(definition);
definition = remainderOfKoreBackendSteps(...);

Then we slowly break each chunk of the kore steps out into other atomic units taht must be run together.

This is just an idea of how to attach teh problem in the other direction. Basically, here, we know that the "dependency/use" chain is a straight line, not a dag, because of how the compiler is currently structured.

At the very least, we could pull out the preCheckPasses(...), and the checkSimplificationRule(...) pass, for more consistency.

@Scott-Guest
Copy link
Contributor

Scott-Guest commented Feb 22, 2024

As suggested by @ehildenb, I'll also work on recording all the attribute read-write dependencies between passes.

This will help to clean up the attribute usages in general and give us a subset of the overall pass dependency DAG.

rv-jenkins added a commit that referenced this issue Feb 26, 2024
#3973 

This defers adding K-IO to the main module from parsing to when the
coverage instrumentation gets generated. It also makes
GenerateCoverage's methods static, and moves the check for the coverage
option outside of it, where generating coverage just turns into a no-op
if the flag isn't there.

There still needs to be a check in DefinitionParsing on the coverage
flag to keep the K-IO module around at the step where unused modules get
trimmed from the definition.

---------

Co-authored-by: rv-jenkins <admin@runtimeverification.com>
@gtrepta
Copy link
Contributor Author

gtrepta commented Feb 26, 2024

ApplySynonyms appears to be tightly bound to the inner parsing logic, and can't be moved any later than resolveNonConfigBubbles, so it's probably best to just leave it where it is in ParserUtils.loadModules for now. We could decide later if we want to separate it out somehow within the parser library as its own step when we do cleanups there.

@gtrepta
Copy link
Contributor Author

gtrepta commented Mar 7, 2024

Next steps, after discussing with @ehildenb:

  • Make a CLI option to kompile that allows the user to specify which transformations are done and in what order.

This will allow faster experimentation of different orderings of the transformations to find dependencies and will make it easier to do these sorts of experiments on downstream semantics.

  • Make a CLI option that tells kompile to skip outer parsing and instead read in a parsed kast in json format to run the compilation steps on.

This along with the previous option allows us to generate isolated tests for each compiler pass with input/output jsons. We can also get the input/output of each individual step in the pipeline and see the actual changes it's making or if it isn't even making any changes, depending on what's in the definition.

This could also be used to help integrate pyk kompile with the java frontend, as pyk can do the outer parsing, and then pass that to kompile where inner parsing and pipeline transformations can be done.

@gtrepta
Copy link
Contributor Author

gtrepta commented Mar 11, 2024

Here's a spreadsheet of every regression test, each pipeline stage, and whether that stage made a change to the definition (1) or didn't make any change (0).

https://docs.google.com/spreadsheets/d/1Jq8XYdNpzTv9XPXQZx1cOTRKBNWsq1fmz3HH4Ipv0Cs/edit?usp=sharing

rv-jenkins added a commit that referenced this issue Mar 13, 2024
#3973 

Adds an option `--kore-backend-steps` to `kompile` which takes a list of
compilation steps to run in the kore backend pipeline.

Omitting this option is equivalent to
`resolveComm,resolveIO,resolveFun,resolveFunctionWithConfig,resolveStrict,resolveAnonVars,resolveContexts,numberSentences1,resolveHeatCoolAttribute,resolveSemanticCasts,subsortKItem1,constantFolding,propagateMacroToRules,guardOrs,resolveFreshConfigConstants,generateSortPredicateSyntax1,generateSortProjections1,expandMacros,addImplicitComputationCell,resolveFreshConstants,generateSortPredicateSyntax2,generateSortProjections2,checkSimplificationRules,subsortKItem2,addStrategyCellToRules,addStrategyRuleToMainModule,concretizeCells,genCoverage,addSemanticsModule,resolveConfigVar,addCoolLikeAtt,markExtraConcreteRules,removeAnywhereRules,generateSortPredicateRules,numberSentences2`

---------

Co-authored-by: rv-jenkins <admin@runtimeverification.com>
@Baltoli
Copy link
Collaborator

Baltoli commented Mar 14, 2024

@ehildenb

@gtrepta
Copy link
Contributor Author

gtrepta commented Mar 18, 2024

Here's another spreadsheet. Each column represents a stage in the kore backend pipeline that was dropped from compilation, and each row is a test in the regression suite with its test result after the stage was dropped. Each cell says whether the test passed (PASS), failed (FAIL), or encountered an error during compilation (KOMPILE). Note that resolveHeatCoolAttribute was skipped for this experiment because it produced interpreters that would never terminate, and that was too hard to deal with.

https://docs.google.com/spreadsheets/d/13jDHr49_EnIOSnhlKzbvpm8FXN620itrZBywmLuKRJs/edit?usp=sharing

@Baltoli
Copy link
Collaborator

Baltoli commented Mar 18, 2024

Nice! So from that second spreadsheet, any fully-green columns indicate that the compiler pass perhaps isn't being exercised properly by the regression test suite? Even if the internal data structures get changed, we still aren't observing any actual behavioural differences at the level of the test suite. It would be interesting to see if we can write a test for each green column that breaks if the pass is removed; doing so would also be a good way of understanding more concretely at the K source level what the passes affect.

@ehildenb
Copy link
Member

That's my interpretation as well. I see:

  • subsortKItem1
  • guardOrs
  • generateSortProjections2
  • checkSimplificationRules
  • subsortKItem2
  • addStrategyCellToRules
  • addStrategyRuleToMainModule
  • genCoverage
  • addCoolLikeAtt
  • removeAnywhereRules
  • numberSentences2

@Baltoli
Copy link
Collaborator

Baltoli commented Mar 18, 2024

addStrategyCellToRules
addStrategyRuleToMainModule

These, at least, are planned for removal

@gtrepta
Copy link
Contributor Author

gtrepta commented Mar 18, 2024

I've looked at guardOrs, introduced by #340 . Referencing the first spreadsheet, the only tests that this stage does anything on are or-haskell, issue-1789-rhsOr/llvm, and or-llvm.

It wraps #Or terms outside of a rewrite in an #as pattern, which is supposed to make an unambiguous substitution for them. This was added five years ago, so either it's no longer necessary or other changes has made these tests miss the breaking case. @dwightguth do you know?

@gtrepta
Copy link
Contributor Author

gtrepta commented Mar 18, 2024

generateSortProjections2 looks like it's only creating productions and rules of projections for the generated top and counter cells.

@Baltoli
Copy link
Collaborator

Baltoli commented Mar 19, 2024

@dwightguth doesn't see why guardOrs isn't broken; @gtrepta to investigate in more detail here to see what's going on.

  • Do the rules have an #Or pattern on the RHS when KORE is generated?
  • If yes, what's the generated LLVM code?
  • If no, why not?

@gtrepta
Copy link
Contributor Author

gtrepta commented Mar 20, 2024

subsortKItem1 and subsortKItem2 don't seem to have any effect on the kore backend pipeline. Omitting them does, however, cause an error when later a call to AddSortInjection attempts to compute the least upper bound for a collection of sorts. Keeping a subsortKItem pass at the end of the pipeline keeps everything working and all of the regression tests pass.

This brings up something I'm noticing about invariants. AFAIK every sort being a subsort of KItem is supposed to be an invariant, but we're enforcing this by explicitly creating the syntax declarations. This creates the phenomenon of the invariant being broken by a stage, staying broken for some intermediate stages, and then being fixed by another stage. This seems less than ideal from a maintainability standpoint, as one would want to rely on every invariant that they know about a K definition, but instead need to keep track of whether or not they're in a place where they could actually use that invariant. Keeping these invariants from being broken, either with assertion checks, or making some of them exist implicitly could help out here.

@gtrepta
Copy link
Contributor Author

gtrepta commented Mar 22, 2024

Here's a list of dependencies between compilation passes that I've found. I will continue to update it as I find them.

  • expandMacros -> propogateMacroToRules
    expandMacros needs the macro attribute on rules with a macro production at the top of the LHS in order to expand that production elsewhere. propogateMacroToRules is meant to put the macro attribute on those rules, and if it doesn't, then expandMacros is very likely to find a macro production it can't expand and will throw an error.
  • genCoverage -> numberSentences
    genCoverage expects the UNIQUE_ID attribute to be on every rule in the definition and uses it for the coverage output when it instruments each rule to log itself. This attribute is added by numberSentences, and genCoverage will throw a NoSuchElementException if it hasn't happened. Also, we probably want genCoverage to happen after every generated rule that will be in the final definition is there, but it looks like that may not be the case.
  • resolveFreshConstants -> resolveFreshConfigConstants
    The initial value for GeneratedCounterCell depends on any fresh variables that are used in the initial configuration, and this is supposed to come from resolveFreshConfigConstants. If resolveFreshConstants happens first, then it's possible for a fresh constant that gets generated during execution to be equal to a fresh constant that was in the inital configuration. This case gets caught be the regression test in issue-3520-freshConfig.

@gtrepta
Copy link
Contributor Author

gtrepta commented Mar 25, 2024

I've added a second sheet to the spreadsheet for results from dropping pipeline steps. This one covers the regression tests that expect a failure with an error message (my methodology earlier couldn't cover these tests). If a cell says PASS then it means that test printed out the expected error message, but if it says FAIL then something else happened instead. This reveals a couple of pipeline steps that fail a test where on the previous sheet they had completely green columns:

  • checkSimplificationRules with checks/functionSimplification
  • removeAnywhereRules with issue-2909-allow-anywhere-haskell/check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants