New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Testing] Add Travis Validation Tests for P4C. #2458
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually looks pretty good.
Can you write a longer description of what this tool does, so it can be reviewed by other people as well?
I have talked to @fruffy about this offline.
ln -s /p4c /gauntlet/p4c | ||
|
||
# Install Gauntlet Python dependencies locally | ||
# Unfortunately we need Python 3.6, which Xenial does not have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's so special about 3.6?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f-strings. I have been using them a lot in the code because they are nicer to read and use. Python 3.6 is available with 16.10 but apparently not 16.04.
Made a bunch of changes to the Z3 representation of a P4 program. The validation now covers all passes, including ExpandLookahead. The tests also include fabric.p4 now. |
Instead of adding any new readme to p4_tv, maybe edit p4c/README.md for a mention of this tool. After all the README.md mentions Also, since p4c/docs/compiler-design.pptx includes a slide or two regarding debugging a failed test with It would be very helpful to get an estimate on the language constructs not fully implemented yet. |
My request was for a comment attached to this PR that describes what your tool does. |
I updated the description. |
Does anyone object to merging this tool? It may lengthen the compilation process, but it may find bugs early in the compiler. |
Correction, it may lengthen the testing process. |
@mbudiu-vmw it's one more job but it's executed in parallel. It seems that the new job doesn't take more time than the existing ones. The issues is that Travis limits open-source projects to 5 concurrent builds (which I believe apply across the entire p4lang organization?). p4c alone runs 6 jobs as part of CI (7 with this change). Since we were already above the limit of 5, and we are still under 10 with this new job, there isn't much of an impact, at least when p4c is the only p4lang project running CI at a given time. Edit: actually there is a small impact since the jobs don't all take the same amount of time, (4mins, 8 mins, and 18mins for all the other ones), but I wouldn't worry about it... If it becomes an issue, we can rethink the CI testing, and move some jobs to Github actions (no limit on concurrent jobs IIRC). It's more important to have better coverage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This pull request add translation validation to the CI workflow of p4c. Translation validation certifies that a particular compiler pass has correctly transformed a given input program without changing its semantics. To perform translation validation for p4c, we use a symbolic interpreter (Gauntlet) to transform P4 programs into Z3 formulas. The Gauntlet workflow requires two external dependencies, which are described in this source organization. You can also find a detailed description of the tool and basic usage instructions there. I currently maintain this organization.
To validate a P4 program, the symbolic interpreter converts the program into a Z3 formula capturing its input-output semantics. An equivalence checker then submits the Z3 formulas of a program before and after a compiler pass to the Z3 SMT solver. The solver tries to find an input that violates equivalence of these two formulas. If it finds such an input, a validation error is reported. Translation validation has two advantages over fuzz testing. First, it can accurately detect subtle differences in program semantics without any knowledge about expected input packets or table entries. Second, by accessing intermediate P4 programs after each compiler pass, we can pinpoint the erroneous pass. This tool specifically tests the front and mid end passes of p4c using
p4test
.The interpreter is written as an extension to p4c and uses the IR generated by the p4c parser to determine the semantics of the P4 program. Each programmable block of a P4 package represents an independent Z3 formula. For example, the v1model package has 6 different independent programmable blocks: Parser, VerifyChecksum, Ingress, Egress, ComputeChecksum, and Deparser. For each block, a new Z3 formula is generated.
Concretely, the semantic representation of the Ingress pipe of the program key-bmv2.p4 looks like this:
This representation can be obtained by running
python3 get_semantics.py -i p4c/testdata/p4_16_samples/key-bmv2.p4
.Here we set
h.h.b
toh.h.a
, only if there is a match-action pair in the table that has the keyh.h.a + h.h.a
and executes the actiona()
.ig
corresponds to a symbolic input packet, whereas0(ig)
,1(ig)
,2(ig)
,... represent the flattened members of the parsed packet in canonical form (e.g.,0(ig)
is the memberh.h.a
ofinout Headers h
). We use this semantic representation in the INPUT/OUTPUT columns for equality comparison.Specific usage in the context of continuous integration
After this pull request, Gauntlet will run on every commit and catch new semantic compiler bugs as they are introduced. Gauntlet will run translation validation of all the passes emitted by
p4test
on every program inp4c/testdata/p4_16_samples/
. If all emitted program versions are equal, the test passes. Otherwise, the test fails. When the validation tool finds a failing test, it will emitVIOLATION
and provide the two passes that differ. At this moment, only information about which two passes are different is available.The Travis check will fail if Gauntlet detects a problematic pass and the log will show the pass that has failed. A developer who has Gauntlet installed can run
python3 validate_p4_translation.py -i [buggy_program.p4]
to identify the defective pass. There may be false positives or interpreter crashes, which will require fixing by me. In that case, please file an issue at https://github.com/p4gauntlet/p4_tv.A word of caution: Just because Gauntlet does not detect an issue with a particular program compilation, there is no guarantee that the compiler is completely bug free. The interpreter may produce incorrect semantics or have bugs itself. This tool is intended to find issues on a best-effort basis.
Unimplemented language constructs and current limitations
The interpreter passes the majority of P4-16 tests in the compiler suite. Failing tests can be found here. There are still several open problems that need fixing, but they are being worked on:
Range, String Literals, and LPM/Ternary matchesare missing.Type inference for generics does not work for control and parser bodies yet.