Adding CFM-1.1: CFM over ETHoCWoMPLSoGRE#4859
Adding CFM-1.1: CFM over ETHoCWoMPLSoGRE#4859ASHNA-AGGARWAL-KEYSIGHT wants to merge 5 commits intoopenconfig:mainfrom
Conversation
Pull Request Functional Test Report for #4859 / 2c93e90Virtual Devices
Hardware Devices
|
Summary of ChangesHello @ASHNA-AGGARWAL-KEYSIGHT, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates a new, extensive feature test for Connectivity Fault Management (CFM) within a complex network topology utilizing Ethernet over Control Word over MPLS over GRE encapsulation. The changes enable robust testing of CFM functionalities, including session lifecycle, packet correctness, fault detection, and performance metrics, by enhancing existing configuration utilities and introducing new validation mechanisms to ensure proper operation across diverse network devices. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive new test suite for CFM over ETHoCWoMPLSoGRE (CFM-1.1), including a large test file, numerous helper functions, and necessary deviations, particularly for Arista devices which rely on CLI-based configurations. While the changes are extensive and cover a new feature area, the review identified several critical and high-severity issues. These include a critical bug in the OpenConfig implementation for GRE next-hop groups, incorrect packet validation logic for CCM intervals and control words, and inverted logic in test validation functions. Additionally, there are significant concerns regarding maintainability and test isolation due to the use of global variables and stateful package design. Several style guide violations, such as the use of time.Sleep and disallowed IP address ranges, were also noted. Addressing these issues is crucial for the correctness, reliability, and maintainability of this new test suite.
| ueh1 := params.NetworkInstance.GetOrCreateStatic().GetOrCreateNextHop(params.NexthopGroupName).GetOrCreateEncapHeader(1) | ||
| for _, addr := range params.DstAddr { | ||
| ueh1.GetOrCreateUdpV4().SetDstIp(addr) | ||
| } | ||
|
|
||
| for _, addr := range params.SrcAddr { | ||
| ueh1.GetOrCreateUdpV4().SetSrcIp(addr) | ||
| } |
There was a problem hiding this comment.
The OpenConfig implementation for configuring multiple next-hops is incorrect. The loops for DstAddr and SrcAddr repeatedly call SetDstIp and SetSrcIp on the same UdpV4 object (ueh1). This overwrites the values in each iteration, meaning only the last IP from each slice will be configured. To configure multiple next-hops for the group, you should create multiple NextHop entries, each with its own encapsulation header configuration.
| var ( | ||
| sfBatch *gnmi.SetBatch | ||
| oam *oc.Oam |
There was a problem hiding this comment.
The use of package-level variables sfBatch and oam introduces global state into the test file. This can lead to unpredictable behavior, race conditions if tests are run in parallel, and makes tests difficult to reason about individually. These variables should be scoped locally within the functions that use them (e.g., configureDut or the test functions themselves) to ensure proper test isolation.
| if uniqueCount < tunnelCount { | ||
| t.Log("flows are not ECMP'd across all available tunnels as expected") | ||
| return | ||
| } | ||
|
|
||
| t.Errorf("error: traffic was load-balanced across %d GRE sources.", uniqueCount) |
There was a problem hiding this comment.
The logic in this function appears to be inverted. It logs a message and returns successfully if uniqueCount < tunnelCount (a failure condition), but reports an error via t.Errorf if uniqueCount >= tunnelCount (the success condition). This will cause the test to fail when it should pass and pass when it should fail. The logic should be corrected to report an error when the number of unique source IPs does not match the expected tunnelCount.
| if uniqueCount < tunnelCount { | |
| t.Log("flows are not ECMP'd across all available tunnels as expected") | |
| return | |
| } | |
| t.Errorf("error: traffic was load-balanced across %d GRE sources.", uniqueCount) | |
| if uniqueCount != tunnelCount { | |
| t.Errorf("traffic was not load-balanced across all %d GRE sources, found %d unique sources", tunnelCount, uniqueCount) | |
| return | |
| } | |
| t.Logf("traffic was correctly load-balanced across all %d GRE sources.", uniqueCount) |
There was a problem hiding this comment.
if this is always one tunnel! and the CFM is not ECMP'd then do we need the error? under what condition is this expected to be an error?
There was a problem hiding this comment.
We do not require an error notification when CFM is not ECMP'd. Instead, we need an error report when we receive PDUs on all 16 tunnels. The README specifies that we should verify that CCM PDUs are flowing consistently and not being ECMP'd across all available tunnels. I have added the necessary check accordingly.
Let me know if this clarifies your question.
| if cfmData[2]&0x0F != byte(expectedInterval) { | ||
| t.Errorf("ccm interval mismatch on packet; expected: %d, got: %d", expectedInterval, cfmData[2]&0x07) | ||
| } else { | ||
| t.Logf("packet has the correct CCM interval: %d", cfmData[2]&0x07) | ||
| } |
There was a problem hiding this comment.
The CCM interval is encoded in the last 3 bits of the flags field (octet 3 of the CCM PDU). The check cfmData[2]&0x0F is incorrect as it uses a 4-bit mask. The correct mask to extract the 3-bit interval value is 0x07. The logging statement already uses the correct mask, but the validation logic is flawed.
| if cfmData[2]&0x0F != byte(expectedInterval) { | |
| t.Errorf("ccm interval mismatch on packet; expected: %d, got: %d", expectedInterval, cfmData[2]&0x07) | |
| } else { | |
| t.Logf("packet has the correct CCM interval: %d", cfmData[2]&0x07) | |
| } | |
| // Verify interval field in CCM packet. | |
| if cfmData[2]&0x07 != byte(expectedInterval) { | |
| t.Errorf("ccm interval mismatch on packet; expected: %d, got: %d", expectedInterval, cfmData[2]&0x07) | |
| } else { | |
| t.Logf("packet has the correct CCM interval: %d", cfmData[2]&0x07) | |
| } |
| rdiBitSet := (cfmData[2] & 0x80) == 1 | ||
| t.Log("*******************************") | ||
| t.Log(rdiBitSet) | ||
| t.Log("*******************************") | ||
| if rdiBitSet { | ||
| t.Errorf("rdi bit verification failed on packet. Expected: %v, Got: %v", !verifyRDIBit, rdiBitSet) | ||
| } | ||
| t.Logf("packet RDI bit is correctly set to %v", verifyRDIBit) |
There was a problem hiding this comment.
The check for the RDI bit is incorrect. The expression (cfmData[2] & 0x80) == 1 will always be false when the RDI bit is set, as 0x80 (128) is not equal to 1. The correct way to check if the bit is set is (cfmData[2] & 0x80) != 0. Additionally, the debug logs using t.Log("*******************************") should be removed from the final code.
rdiBitSet := (cfmData[2] & 0x80) != 0
if rdiBitSet != verifyRDIBit {
t.Errorf("rdi bit verification failed on packet. Expected: %v, Got: %v", verifyRDIBit, rdiBitSet)
}
t.Logf("packet RDI bit is correctly set to %v", rdiBitSet)| return fmt.Errorf("could not write bytes to pcap file: %v", err) | ||
| } | ||
| defer os.Remove(f.Name()) // Clean up the temporary file | ||
| // defer os.Remove(f.Name()) // Clean up the temporary file |
There was a problem hiding this comment.
This line, which is responsible for cleaning up the temporary pcap file, is commented out. This will result in a resource leak, as temporary files will be left on the system after the test completes. This line should be uncommented to ensure proper cleanup.
| // defer os.Remove(f.Name()) // Clean up the temporary file | |
| defer os.Remove(f.Name()) // Clean up the temporary file |
| if packetVal.MPLSLayer.ControlWordHeader { | ||
| if len(mpls.Payload) >= 4 { | ||
| controlWord := mpls.Payload[:4] | ||
| if uint16(controlWord[0])<<8|uint16(controlWord[1]) == uint16(packetVal.MPLSLayer.ControlWordSequence) { | ||
| t.Logf("%v (32-bit field ) control word is inserted between the MPLS label stack and the Layer 2 payload (the Ethernet frame).0", packetVal.MPLSLayer.ControlWordSequence) | ||
| } | ||
| } else { | ||
| t.Errorf("Control Word header not found") | ||
| } | ||
| } |
There was a problem hiding this comment.
The validation logic for the MPLS control word is incorrect.
- It only checks the first 16 bits of the sequence number by casting a
uint32touint16. The full 32-bit sequence number should be validated. - According to RFC 4385, the first 4 bits of the control word must be zero. This check is missing.
Please update the logic to perform a complete and correct validation of the 32-bit control word.
if packetVal.MPLSLayer.ControlWordHeader {
if len(mpls.Payload) < 4 {
return fmt.Errorf("payload too short for Control Word header")
}
controlWord := mpls.Payload[:4]
// Per RFC 4385, the first 4 bits of the control word MUST be 0.
if (controlWord[0] & 0xF0) != 0 {
return fmt.Errorf("invalid control word format, first 4 bits are not zero: 0x%x", controlWord[0])
}
sequenceNumber := binary.BigEndian.Uint32(controlWord)
if sequenceNumber != packetVal.MPLSLayer.ControlWordSequence {
return fmt.Errorf("control word sequence number mismatch. Got: %d, Want: %d", sequenceNumber, packetVal.MPLSLayer.ControlWordSequence)
}
t.Logf("Control Word with sequence number %d found as expected.", sequenceNumber)
}| time.Sleep(10 * time.Second) | ||
| ate.OTG().StopTraffic(t) | ||
| time.Sleep(10 * time.Second) |
There was a problem hiding this comment.
The use of time.Sleep is discouraged by the repository style guide (line 53), which recommends using gnmi.Watch with .Await for waiting on conditions. Using fixed sleeps can lead to flaky tests that are either too slow or fail intermittently. Please refactor this to use a more deterministic waiting mechanism, such as waiting for a specific OTG state.
| } | ||
| helpers.GnmiCLIConfig(t, dut, cli) | ||
| } else { | ||
| // OC is not available |
There was a problem hiding this comment.
The comment // OC is not available is confusing because it's inside an else block that is meant to handle the OpenConfig case. If there is no OC path for this configuration, the function should probably not have this else block. If there is a path, the implementation should be added. Please clarify or implement the intended OC configuration.
internal/helpers/helpers.go
Outdated
|
|
||
| // GetGnmiCLIOutput sets config built with buildCliConfigRequest and returns the output. | ||
| func GetGnmiCLIOutput(t testing.TB, dut *ondatra.DUTDevice, config string) *gpb.GetResponse { | ||
| GnmiCLIConfig(t, dut, config) |
There was a problem hiding this comment.
The function GetGnmiCLIOutput first calls GnmiCLIConfig, which performs a gNMI Set operation. This is incorrect for show commands, which should only be performed via a Get request. Sending a show command as part of a Set request is unconventional and may lead to errors or unintended side effects. The call to GnmiCLIConfig should be removed from this helper.
func GetGnmiCLIOutput(t testing.TB, dut *ondatra.DUTDevice, config string) *gpb.GetResponse {
balaji6
left a comment
There was a problem hiding this comment.
Added few comments on SLM and error message. Please take a look, thanks.
4d1f267 to
ebede7c
Compare
50fc76a to
d1dab0e
Compare
Logs attached: https://partnerissuetracker.corp.google.com/issues/415458482#comment166 |
feature/bgp/route_selection/otg_tests/route_leakage_between_nondefault_vrfs/metadata.textproto
Show resolved
Hide resolved
| // Arista : https://partnerissuetracker.corp.google.com/issues/434922681 | ||
| bool otn_to_eth_assignment = 317; | ||
|
|
||
| // Devices that do not support import export policies configured in network instance |
There was a problem hiding this comment.
@ASHNA-AGGARWAL-KEYSIGHT - Can you please add the bug ID for this deviation with the vendor. Without the bug ID we cannot approve this PR
There was a problem hiding this comment.
@balaji6 Could you provide the bug IDs for the two deviations
proto/metadata.proto
Outdated
| bool reduced_ecmp_set_on_mixed_encap_decap_nh = 378; | ||
|
|
||
| // Devices that do not support mpls static pseudowire OC | ||
| bool mpls_static_pseudowire_oc_unsupported = 379; |
There was a problem hiding this comment.
@ASHNA-AGGARWAL-KEYSIGHT - Please add the relevant bug ID as mentioned above.
|
@balaji6 - Can you please help validate this PR in google environment, so we know at least it is passing. |
d1dab0e to
3bcc69c
Compare
3bcc69c to
0c078e6
Compare
Readme Location:
https://github.com/openconfig/featureprofiles/blob/main/feature/cfm/otg_tests/cfm_base/README.md
Have raised an issue:
https://partnerissuetracker.corp.google.com/issues/446782522
logs attached: https://partnerissuetracker.corp.google.com/issues/415458482