Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reject duplicate P4Runtime election IDs from multiple controllers #857

Merged
merged 3 commits into from Dec 8, 2021

Conversation

pudelkoM
Copy link
Member

@pudelkoM pudelkoM commented Dec 4, 2021

P4Service currently does not reject duplicate election IDs from controllers. This leads to a data corruption when a slave controller sends a MastershipUpdate with the same election ID as the current master:

  • 3 Controllers connected: 1 (1111, slave), 2 (2222, master), 3 (1212, slave)
  • C1 sends arb. update with election id 2222, same as current master
  • Call entry: AddOrModifyController 1, 2222, ipv4:127.0.0.1:56994.
  • Controller set at beginning of call: Controller (2, 2222, ipv4:127.0.0.1:56994), Controller (3, 1212, ipv4:127.0.0.1:56994), Controller (1, 1111, ipv4:127.0.0.1:56994), .
  • C1 gets deleted from set: Controller (2, 2222, ipv4:127.0.0.1:56994), Controller (3, 1212, ipv4:127.0.0.1:56994), .
  • Try to insert C1 into set, but election ID already exists, C1 lost: Controller (2, 2222, ipv4:127.0.0.1:56994), Controller (3, 1212, ipv4:127.0.0.1:56994), .
  • Stratum falsely logs C1 as new master: Controller (connection_id: 1, election_id: 2222, uri: ipv4:127.0.0.1:56994) is connected as MASTER for node (aka device) with ID 123123123.
  • C2 receives update: Controller #2 mastership update: arbitration { device_id: 123123123 election_id { low: 2222 } status { } }
  • C3 receives update: Controller #3 mastership update: arbitration { device_id: 123123123 election_id { low: 2222 } status { code: 6 message: "You are not my master!" } }
  • No updates sent to C1 and the stream is still open, but not recognized by Stratum as a controller anymore
  • C1 might thinks it's master, but all calls will lead to ERR_PERMISSION_DENIED errors

This happens because we store Controllers in a set with a custom comparison function. Controllers are not hashed, but only compared by their election ID. Thus, two controllers with the same election ID are not possible.

// Map from node ID to the set of Controller instances corresponding to the
// external controller clients connected to that node. The Controller
// instances for each node are sorted such that the master (Controller
// with highest election_id) is the first element.
std::map<uint64, std::set<Controller, ControllerComp>> node_id_to_controllers_
GUARDED_BY(controller_lock_);

// Custom comparator for Controller class.
struct ControllerComp {
bool operator()(const Controller& x, const Controller& y) const {
// To make sure controller with the highest election_id is the 1st element
return x.election_id() > y.election_id();
}
};

This PR changes this by checking for insertion success. Failure indicates a duplicate and we close the stream, all according to spec:

P4RT Spec - Section 5.3:

Otherwise, if the MasterArbitrationUpdate message is received from an already connected controller:

If the election_id is set and is already used by another controller (excluding the controller making the request) for the same (device_id, role_id), the P4Runtime server shall terminate the stream by returning an INVALID_ARGUMENT error.

Additional considerations

Stratum is not fully P4RT spec compliant, with or without this fix. While this change brings us closer, a gap remains. At some point we should consider delegating P4RT related code to an external, well tested library.

Outright closing the stream without telling the offending controller the currently highest election ID puts the controller an an interesting situation:

  • To become master it has to know the current ID, which is only broadcast to backups
  • To become backup it has to find a lower, free election ID by guessing or bruteforce

@codecov
Copy link

codecov bot commented Dec 4, 2021

Codecov Report

Merging #857 (333d7f6) into main (05fa2e6) will increase coverage by 0.00%.
The diff coverage is 80.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #857   +/-   ##
=======================================
  Coverage   78.55%   78.56%           
=======================================
  Files         334      334           
  Lines       30054    30057    +3     
=======================================
+ Hits        23610    23613    +3     
  Misses       6444     6444           
Impacted Files Coverage Δ
stratum/hal/lib/common/p4_service.h 100.00% <ø> (ø)
stratum/hal/lib/common/p4_service.cc 82.93% <80.00%> (+0.09%) ⬆️

@pudelkoM pudelkoM changed the title Reject duplicate P4RT election IDs Reject duplicate P4Runtime election IDs from multiple controllers Dec 4, 2021
@pudelkoM
Copy link
Member Author

pudelkoM commented Dec 4, 2021

@pierventre FYI

pierventre
pierventre previously approved these changes Dec 6, 2021
@pierventre
Copy link
Contributor

@pudelkoM regarding your additional comments, I agree on using a well tested library. I am not fully sure about your final remarks - in theory an ONOS cluster should not need this and each instance can potentially guess the election id of the other instances. Happy to discuss today. Thanks a lot for this hot fix.

@pierventre
Copy link
Contributor

tested - works great

@bocon13 bocon13 added this to the 2021-12 Release milestone Dec 7, 2021
@bocon13 bocon13 added the P4/P4Runtime Things related to P4 or P4Runtime label Dec 7, 2021
Copy link
Member

@bocon13 bocon13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pudelkoM pudelkoM merged commit 7059d58 into main Dec 8, 2021
@pudelkoM pudelkoM deleted the p4rt-reject-dup-election-ids branch December 8, 2021 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P4/P4Runtime Things related to P4 or P4Runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants