Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Testing] Add Travis Validation Tests for P4C. #2458

Merged
merged 8 commits into from Aug 5, 2020

Conversation

fruffy
Copy link
Collaborator

@fruffy fruffy commented Jun 30, 2020

This pull request add translation validation to the CI workflow of p4c. Translation validation certifies that a particular compiler pass has correctly transformed a given input program without changing its semantics. To perform translation validation for p4c, we use a symbolic interpreter (Gauntlet) to transform P4 programs into Z3 formulas. The Gauntlet workflow requires two external dependencies, which are described in this source organization. You can also find a detailed description of the tool and basic usage instructions there. I currently maintain this organization.

To validate a P4 program, the symbolic interpreter converts the program into a Z3 formula capturing its input-output semantics. An equivalence checker then submits the Z3 formulas of a program before and after a compiler pass to the Z3 SMT solver. The solver tries to find an input that violates equivalence of these two formulas. If it finds such an input, a validation error is reported. Translation validation has two advantages over fuzz testing. First, it can accurately detect subtle differences in program semantics without any knowledge about expected input packets or table entries. Second, by accessing intermediate P4 programs after each compiler pass, we can pinpoint the erroneous pass. This tool specifically tests the front and mid end passes of p4c using p4test.

The interpreter is written as an extension to p4c and uses the IR generated by the p4c parser to determine the semantics of the P4 program. Each programmable block of a P4 package represents an independent Z3 formula. For example, the v1model package has 6 different independent programmable blocks: Parser, VerifyChecksum, Ingress, Egress, ComputeChecksum, and Deparser. For each block, a new Z3 formula is generated.

Concretely, the semantic representation of the Ingress pipe of the program key-bmv2.p4 looks like this:

INFO:PIPE ig:                                                                                                   
P4 PROGRAM FRAGMENT          INPUT                        OUTPUT                                                
---------------------------  ---------------------------  ---------------------------------------------------   
h.h.a                        If(0_valid, 0(ig), invalid)  If(0_valid, 0(ig), invalid)                           
h.h.b                        If(0_valid, 1(ig), invalid)  If(0_valid,                                           
                                                             If(And(2*0(ig) == t_table_key_0, 1 == t_action),   
                                                                0(ig),                                          
                                                                1(ig)),                                         
                                                             invalid)                                           
sm.ingress_port              2(ig)                        2(ig)                                                 
sm.egress_spec               3(ig)                        0                                                     
sm.egress_port               4(ig)                        4(ig)                                                 
sm.instance_type             5(ig)                        5(ig)                                                 
sm.packet_length             6(ig)                        6(ig)                                                 
sm.enq_timestamp             7(ig)                        7(ig)                                                 
sm.enq_qdepth                8(ig)                        8(ig)                                                 
sm.deq_timedelta             9(ig)                        9(ig)                                                 
sm.deq_qdepth                10(ig)                       10(ig)                                                
sm.ingress_global_timestamp  11(ig)                       11(ig)                                                
sm.egress_global_timestamp   12(ig)                       12(ig)                                                
sm.mcast_grp                 13(ig)                       13(ig)                                                
sm.egress_rid                14(ig)                       14(ig)                                                
sm.checksum_error            15(ig)                       15(ig)                                                
sm.parser_error              16(ig)                       16(ig)                                                
sm.priority                  17(ig)                       17(ig)                                                 

This representation can be obtained by running python3 get_semantics.py -i p4c/testdata/p4_16_samples/key-bmv2.p4.

Here we set h.h.b to h.h.a, only if there is a match-action pair in the table that has the key h.h.a + h.h.a and executes the action a(). ig corresponds to a symbolic input packet, whereas 0(ig), 1(ig), 2(ig),... represent the flattened members of the parsed packet in canonical form (e.g., 0(ig) is the member h.h.a of inout Headers h). We use this semantic representation in the INPUT/OUTPUT columns for equality comparison.

Specific usage in the context of continuous integration

After this pull request, Gauntlet will run on every commit and catch new semantic compiler bugs as they are introduced. Gauntlet will run translation validation of all the passes emitted by p4test on every program in p4c/testdata/p4_16_samples/. If all emitted program versions are equal, the test passes. Otherwise, the test fails. When the validation tool finds a failing test, it will emit VIOLATION and provide the two passes that differ. At this moment, only information about which two passes are different is available.

The Travis check will fail if Gauntlet detects a problematic pass and the log will show the pass that has failed. A developer who has Gauntlet installed can run python3 validate_p4_translation.py -i [buggy_program.p4] to identify the defective pass. There may be false positives or interpreter crashes, which will require fixing by me. In that case, please file an issue at https://github.com/p4gauntlet/p4_tv.

A word of caution: Just because Gauntlet does not detect an issue with a particular program compilation, there is no guarantee that the compiler is completely bug free. The interpreter may produce incorrect semantics or have bugs itself. This tool is intended to find issues on a best-effort basis.

Unimplemented language constructs and current limitations

The interpreter passes the majority of P4-16 tests in the compiler suite. Failing tests can be found here. There are still several open problems that need fixing, but they are being worked on:

  • Semantics for Runtime Header Indices, Annotations, Range, String Literals, and LPM/Ternary matches are missing.
  • Semantics for table properties such as counters, meters, or action profiles are not implemented.
  • Symbolic outputs of extern methods do not respect call order or input values yet (they always return the same symbolic output).
  • Method overloading in externs is not yet supported.
  • Type inference for generics does not work for control and parser bodies yet.
  • Variable scoping does not work in all cases (e.g., shadowing).
  • Parser loops are currently bounded using a fixed value.
  • Some specific P4 specification method semantics such as extract or lookahead are missing.

Copy link
Contributor

@mihaibudiu mihaibudiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually looks pretty good.
Can you write a longer description of what this tool does, so it can be reviewed by other people as well?
I have talked to @fruffy about this offline.

ln -s /p4c /gauntlet/p4c

# Install Gauntlet Python dependencies locally
# Unfortunately we need Python 3.6, which Xenial does not have
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's so special about 3.6?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f-strings. I have been using them a lot in the code because they are nicer to read and use. Python 3.6 is available with 16.10 but apparently not 16.04.

@fruffy
Copy link
Collaborator Author

fruffy commented Jul 11, 2020

Made a bunch of changes to the Z3 representation of a P4 program. The validation now covers all passes, including ExpandLookahead. The tests also include fabric.p4 now.
A detailed description can be found in the readme of https://github.com/p4gauntlet/p4_tv. Should I add an extra readme here, too? I can also try to provide an estimate on the language constructs that are not fully implemented yet.

@fruffy fruffy marked this pull request as ready for review July 11, 2020 10:21
@hesingh
Copy link
Contributor

hesingh commented Jul 11, 2020

Instead of adding any new readme to p4_tv, maybe edit p4c/README.md for a mention of this tool. After all the README.md mentions make check.

Also, since p4c/docs/compiler-design.pptx includes a slide or two regarding debugging a failed test with make check, the slides should be updated to mention this tool.

It would be very helpful to get an estimate on the language constructs not fully implemented yet.

@mihaibudiu
Copy link
Contributor

My request was for a comment attached to this PR that describes what your tool does.
Note that if we merge this tool you will be responsible for keeping it up-to-date, since our build will depend on it.

@fruffy
Copy link
Collaborator Author

fruffy commented Jul 14, 2020

I updated the description.

@mihaibudiu
Copy link
Contributor

Does anyone object to merging this tool? It may lengthen the compilation process, but it may find bugs early in the compiler.

@mihaibudiu
Copy link
Contributor

Correction, it may lengthen the testing process.

@antoninbas
Copy link
Member

antoninbas commented Aug 4, 2020

@mbudiu-vmw it's one more job but it's executed in parallel. It seems that the new job doesn't take more time than the existing ones. The issues is that Travis limits open-source projects to 5 concurrent builds (which I believe apply across the entire p4lang organization?). p4c alone runs 6 jobs as part of CI (7 with this change). Since we were already above the limit of 5, and we are still under 10 with this new job, there isn't much of an impact, at least when p4c is the only p4lang project running CI at a given time.

Edit: actually there is a small impact since the jobs don't all take the same amount of time, (4mins, 8 mins, and 18mins for all the other ones), but I wouldn't worry about it... If it becomes an issue, we can rethink the CI testing, and move some jobs to Github actions (no limit on concurrent jobs IIRC). It's more important to have better coverage.

Copy link
Member

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mihaibudiu mihaibudiu merged commit a5c6c5b into p4lang:master Aug 5, 2020
@fruffy fruffy deleted the travis_validation_tests branch August 21, 2020 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants