-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
protocol: implement peering links #4299
Conversation
Running However, some of the traceroute tests run via
Doing the same with a traceroute:
Inspecting
So MAC verification for SCMP is still an issue on peering links, at least in traceroute scenarios. Any pointers what I need to look at to make this work? |
0557184
to
d47f8d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So MAC verification for SCMP is still an issue on peering links, at least in traceroute scenarios. Any pointers what I need to look at to make this work?
I would look at the code paths that differ between a SCION and SCMP info packet.
Probably, there is some small bug in the handling of the router.
What also strikes me as odd is that we have 6 interfaces, but there is only 4 output lines.
I would expect one line per interface.
Maybe the bug is also hiding in how the packet is composed
Reviewable status: 0 of 11 files reviewed, 1 unresolved discussion (waiting on @matzf)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 11 files at r1.
Reviewable status: 3 of 11 files reviewed, 9 unresolved discussions (waiting on @benthor and @matzf)
pkg/slayers/path/scion/base.go
line 79 at r2 (raw file):
if int(s.PathMeta.CurrHF) >= s.NumHops-1 { s.PathMeta.CurrHF = uint8(s.NumHops - 1) return serrors.New(fmt.Sprintf("path already at end, currhf %d, numhops %d",
use the errCtx of serrors.New
(see https://pkg.go.dev/github.com/scionproto/scion@v0.8.0/pkg/private/serrors#New)
Suggestion:
return serrors.New("path already at end", "curr_hf", s.PathMeta.CurrHF, "num_hops", s.NumHops)
private/topology/json/json.go
line 107 at r2 (raw file):
MTU int `json:"mtu"` BFD *BFD `json:"bfd,omitempty"` RemoteIFID common.IFIDType `json:"remote_if_id,omitempty"`
Suggestion:
RemoteIFID common.IFIDType `json:"remote_interface_id,omitempty"`
router/dataplane.go
line 95 at r2 (raw file):
linkTypes map[uint16]topology.LinkType neighborIAs map[uint16]addr.IA remotePeerIF map[uint16]uint16
Suggestion:
peerInterfaces map[uint16]uint16
router/dataplane.go
line 257 at r2 (raw file):
// a different type, this method will return an error. This can only // be called on a not yet running dataplane. func (d *DataPlane) AddRemotePeer(localifID, remoteifID uint16) error {
Suggestion:
func (d *DataPlane) AddRemotePeer(local, remote uint16) error {
router/dataplane.go
line 262 at r2 (raw file):
} if _, exists := d.remotePeerIF[localifID]; exists { return serrors.WithCtx(alreadySet, "localifID", localifID)
Suggestion:
return serrors.WithCtx(alreadySet, "local_interface", localifID)
router/dataplane.go
line 846 at r2 (raw file):
} // TODO: proper error
Adding a comment, such that we do not forget about this.
router/dataplane.go
line 848 at r2 (raw file):
// TODO: proper error err := serrors.New("TODO: segment length error (peering)") result := processResult{}
drop this variable and just inline it.
router/dataplane.go
line 862 at r2 (raw file):
// The peer hop fields are the last hop field on the first path // segment and the first hop field of the second path segment. currHF := p.path.PathMeta.CurrHF
Something about condition feels off.
Based on the ConsDir we can determine whether we need to be the first hop or the last hop on the segment.
Also you are mentioning first or second path segment, but in the code we only check SegLen[0].
I wonder why it still works.
Code quote:
// The peer hop fields are the last hop field on the first path
// segment and the first hop field of the second path segment.
currHF := p.path.PathMeta.CurrHF
segLen := p.path.PathMeta.SegLen[0]
p.peering = currHF == segLen-1 || currHF == segLen
return result, nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r2.
Reviewable status: 4 of 11 files reviewed, 9 unresolved discussions (waiting on @benthor and @matzf)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 4 files at r2.
Reviewable status: 7 of 11 files reviewed, 9 unresolved discussions (waiting on @benthor and @matzf)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 7 of 11 files reviewed, 9 unresolved discussions (waiting on @matzf and @oncilla)
router/dataplane.go
line 862 at r2 (raw file):
Previously, oncilla (Dominik Roos) wrote…
Something about condition feels off.
Based on the ConsDir we can determine whether we need to be the first hop or the last hop on the segment.
Also you are mentioning first or second path segment, but in the code we only check SegLen[0].
I wonder why it still works.
I see what you mean, something looks not quite right yet. I'm a bit out of my depth here, because this code was your own contribution in the old PR and the crucial improvement over my own attempts to make peering work again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 11 files reviewed, 8 unresolved discussions (waiting on @matzf and @oncilla)
pkg/slayers/path/scion/base.go
line 79 at r2 (raw file):
Previously, oncilla (Dominik Roos) wrote…
use the errCtx of serrors.New
(see https://pkg.go.dev/github.com/scionproto/scion@v0.8.0/pkg/private/serrors#New)
Done.
private/topology/json/json.go
line 107 at r2 (raw file):
MTU int `json:"mtu"` BFD *BFD `json:"bfd,omitempty"` RemoteIFID common.IFIDType `json:"remote_if_id,omitempty"`
Done.
router/dataplane.go
line 95 at r2 (raw file):
linkTypes map[uint16]topology.LinkType neighborIAs map[uint16]addr.IA remotePeerIF map[uint16]uint16
Done.
router/dataplane.go
line 257 at r2 (raw file):
// a different type, this method will return an error. This can only // be called on a not yet running dataplane. func (d *DataPlane) AddRemotePeer(localifID, remoteifID uint16) error {
Done.
router/dataplane.go
line 262 at r2 (raw file):
} if _, exists := d.remotePeerIF[localifID]; exists { return serrors.WithCtx(alreadySet, "localifID", localifID)
Done.
router/dataplane.go
line 848 at r2 (raw file):
Previously, oncilla (Dominik Roos) wrote…
drop this variable and just inline it.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r3.
Reviewable status: 6 of 11 files reviewed, 3 unresolved discussions (waiting on @benthor and @matzf)
router/dataplane.go
line 862 at r2 (raw file):
Previously, benthor wrote…
I see what you mean, something looks not quite right yet. I'm a bit out of my depth here, because this code was your own contribution in the old PR and the crucial improvement over my own attempts to make peering work again.
Ah, actually now that I look at it closer, the conditions seems to be right:
currHF == segLen-1
: The last hop on the first segment
currHF == segLen
: The first hop on the second segment
We do not need to check the second segment length because it is the first hop on the second segment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, my suspicion is that there is not really an issue in the router, but rather in the scion tool.
When doing traceroute, between the direct peers, no packet is even sent.
Reviewable status: 6 of 11 files reviewed, 3 unresolved discussions (waiting on @benthor and @matzf)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, after fixing traceroute, I got to the following:
root@1793b53d68a3:/share# scion traceroute 2-ff00:0:212,172.20.0.167 -i
Available paths to 2-ff00:0:212:
3 Hops:
[ 0] Hops: [1-ff00:0:113 3>3 1-ff00:0:112 42>23 2-ff00:0:212] MTU: 1472 NextHop: 172.20.0.114:30042
5 Hops:
[ 1] Hops: [1-ff00:0:113 3>3 1-ff00:0:112 1>2 1-ff00:0:110 3>3 2-ff00:0:210 2>7 2-ff00:0:212] MTU: 1400 NextHop: 172.20.0.114:30042
Choose path: 0
Using path:
Hops: [1-ff00:0:113 3>3 1-ff00:0:112 42>23 2-ff00:0:212] MTU: 1472 NextHop: 172.20.0.114:30042
0 1-ff00:0:113,172.20.0.114 IfID=3 1.111ms 0.753ms 0.706ms
1 ?? * * *
2 1-ff00:0:112,172.20.0.100 IfID=42 1.782ms 1.329ms 1.491ms
3 ?? * * *
Error: packets were lost
Reviewable status: 6 of 11 files reviewed, 3 unresolved discussions (waiting on @benthor and @matzf)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is related to segments where the peer flag is set, but they are not traversed until the peer interface.
Investigating further
Reviewable status: 6 of 11 files reviewed, 3 unresolved discussions (waiting on @benthor and @matzf)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
root@1793b53d68a3:/share# scion traceroute 2-ff00:0:212,172.20.0.167 -i
Available paths to 2-ff00:0:212:
3 Hops:
[ 0] Hops: [1-ff00:0:113 3>3 1-ff00:0:112 42>23 2-ff00:0:212] MTU: 1472 NextHop: 172.20.0.114:30042
5 Hops:
[ 1] Hops: [1-ff00:0:113 3>3 1-ff00:0:112 1>2 1-ff00:0:110 3>3 2-ff00:0:210 2>7 2-ff00:0:212] MTU: 1400 NextHop: 172.20.0.114:30042
Choose path: 0
Using path:
Hops: [1-ff00:0:113 3>3 1-ff00:0:112 42>23 2-ff00:0:212] MTU: 1472 NextHop: 172.20.0.114:30042
0 1-ff00:0:113,172.20.0.114 IfID=3 1.148ms 0.683ms 0.828ms
1 1-ff00:0:112,172.20.0.99 IfID=3 1.908ms 1.138ms 1.136ms
2 1-ff00:0:112,172.20.0.100 IfID=42 2.586ms 1.848ms 2.487ms
3 2-ff00:0:212,172.20.0.164 IfID=23 2.708ms 2.239ms 2.095ms
alright, turns out IsXover is a trap. I will add a diff shortly
Reviewable status: 6 of 11 files reviewed, 3 unresolved discussions (waiting on @benthor and @matzf)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diff --git a/pkg/slayers/path/scion/base.go b/pkg/slayers/path/scion/base.go
index f6d7651dc..f0435a22d 100644
--- a/pkg/slayers/path/scion/base.go
+++ b/pkg/slayers/path/scion/base.go
@@ -84,7 +84,11 @@ func (s *Base) IncPath() error {
return nil
}
-// IsXover returns whether we are at a crossover point.
+// IsXover returns whether we are at a crossover point. This includes all
+// segment switches, even over a peering link. Note that handling of a regular
+// segment switch and handling of a segment switch over a peering link are
+// fundamentally different. To distinguish the two, you will need to extract
+// the information from the info field.
func (s *Base) IsXover() bool {
return s.PathMeta.CurrINF != s.infIndexForHF(s.PathMeta.CurrHF+1)
}
diff --git a/router/dataplane.go b/router/dataplane.go
index 303e3ea41..d3054f94e 100644
--- a/router/dataplane.go
+++ b/router/dataplane.go
@@ -983,12 +983,15 @@ func (p *scionPacketProcessor) verifyCurrentMAC() (processResult, error) {
slayers.SCMPCodeInvalidHopFieldMAC),
},
&slayers.SCMPParameterProblem{Pointer: p.currentHopPointer()},
- serrors.New("MAC verification failed", "expected", fmt.Sprintf(
- "%x", fullMac[:path.MacLen]),
+ serrors.New("MAC verification failed",
+ "expected", fmt.Sprintf("%x", fullMac[:path.MacLen]),
"actual", fmt.Sprintf("%x", p.hopField.Mac[:path.MacLen]),
"cons_dir", p.infoField.ConsDir,
- "if_id", p.ingressID, "curr_inf", p.path.PathMeta.CurrINF,
- "curr_hf", p.path.PathMeta.CurrHF, "seg_id", p.infoField.SegID),
+ "if_id", p.ingressID,
+ "curr_inf", p.path.PathMeta.CurrINF,
+ "curr_hf", p.path.PathMeta.CurrHF,
+ "seg_id", p.infoField.SegID,
+ ),
)
}
// Add the full MAC to the SCION packet processor,
@@ -1513,7 +1516,7 @@ func (p *scionPacketProcessor) prepareSCMP(scmpH *slayers.SCMP, scmpP gopacket.S
revPath := revPathTmp.(*scion.Decoded)
// Revert potential path segment switches that were done during processing.
- if revPath.IsXover() {
+ if revPath.IsXover() && !p.peering {
if err := revPath.IncPath(); err != nil {
return nil, serrors.Wrap(cannotRoute, err, "details", "reverting cross over for SCMP")
}
@@ -1523,7 +1526,7 @@ func (p *scionPacketProcessor) prepareSCMP(scmpH *slayers.SCMP, scmpP gopacket.S
_, external := p.d.external[p.ingressID]
if external {
infoField := &revPath.InfoFields[revPath.PathMeta.CurrINF]
- if infoField.ConsDir {
+ if infoField.ConsDir && !p.peering {
hopField := revPath.HopFields[revPath.PathMeta.CurrHF]
infoField.UpdateSegID(hopField.Mac)
}
diff --git a/scion/traceroute/traceroute.go b/scion/traceroute/traceroute.go
index 8c9c75251..175a58a87 100644
--- a/scion/traceroute/traceroute.go
+++ b/scion/traceroute/traceroute.go
@@ -145,6 +145,7 @@ func (t *tracerouter) Traceroute(ctx context.Context) (Stats, error) {
defer log.HandlePanic()
t.drain(ctx)
}()
+
prevXover := false
for i := 0; i < len(idxPath.HopFields); i++ {
hf := idxPath.PathMeta.CurrHF
@@ -162,7 +163,9 @@ func (t *tracerouter) Traceroute(ctx context.Context) (Stats, error) {
t.updateHandler(u)
}
}
- xover := idxPath.IsXover()
+ // Peering links do not count as regular cross overs. For peering
+ // links we probe all interfaces on the path.
+ xover := idxPath.IsXover() && !info.Peer
// The last hop of the path isn't probed, only the ingress interface is
// relevant.
// At a crossover (segment change) only the ingress interface is
Reviewable status: 6 of 11 files reviewed, 2 unresolved discussions (waiting on @benthor and @matzf)
An odd bug keeps triggering on integration test. I've changed
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 12 files reviewed, 2 unresolved discussions (waiting on @matzf and @oncilla)
router/dataplane.go
line 862 at r2 (raw file):
Previously, oncilla (Dominik Roos) wrote…
Ah, actually now that I look at it closer, the conditions seems to be right:
currHF == segLen-1
: The last hop on the first segment
currHF == segLen
: The first hop on the second segmentWe do not need to check the second segment length because it is the first hop on the second segment.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 11 files at r1, 1 of 4 files at r3, 3 of 3 files at r4, 1 of 1 files at r5.
Reviewable status: all files reviewed (commit messages unreviewed), 1 unresolved discussion (waiting on @benthor and @matzf)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need a bit more information.
- What changes to the topology did you do?
- What tool fails: ping vs traceroute vs showpaths?
- Between which hosts is the failure
Reviewable status: all files reviewed (commit messages unreviewed), 1 unresolved discussion (waiting on @benthor and @matzf)
My latest commit changed the following: diff --git a/topology/peering-test.topo b/topology/peering-test.topo
index 40aed961f..b32873948 100644
--- a/topology/peering-test.topo
+++ b/topology/peering-test.topo
@@ -33,6 +33,7 @@ links:
- {a: "1-ff00:0:110#2", b: "1-ff00:0:112#1", linkAtoB: CHILD, bw: 500}
- {a: "1-ff00:0:112#3", b: "1-ff00:0:113#3", linkAtoB: CHILD, bw: 500}
- {a: "1-ff00:0:113#4", b: "1-ff00:0:114#4", linkAtoB: CHILD, bw: 500}
+ - {a: "1-ff00:0:112#47", b: "1-ff00:0:114#11", linkAtoB: PEER}
- {a: "1-ff00:0:110#3", b: "2-ff00:0:210#3", linkAtoB: CORE}
- {a: "2-ff00:0:210#1", b: "2-ff00:0:211#3", linkAtoB: CHILD, mtu: 1280}
- {a: "2-ff00:0:210#2", b: "2-ff00:0:212#7", linkAtoB: CHILD, bw: 500}
- {a: "2-ff00:0:212#3", b: "2-ff00:0:213#3", linkAtoB: CHILD, bw: 500}
- {a: "1-ff00:0:112#42", b: "2-ff00:0:212#23", linkAtoB: PEER}
Ping as well as traceroute fail. It's easiest to verify when running
Between any hosts that select a path via one of the two peering links offered by AS
|
Addendum: The bug has an interesting side-effect on
With an additional peering link on AS -112 enabled,
|
Unrelated question: should I rebase or merge from master at this point or would that just complicated matters? |
Generally, feel free to merge or rebase at any point. The Reviewable tool is very robust and does not easily get confused. The rebase-and-squash merge workflow that we use in this repository means that you don't need to worry about a clean commit history in your branch. The merge conflicts that currently come up only come from moving the function calls for Btw. I've looked at the problem that show up with the changed peering topology, but haven't been able to figure out what is going on yet. I'll try to look into this again and get back to you by next week. |
I reverted Not sure if rebasing at this point wasn't a mistake. Some chaos couldn't be avoided, plus I had to make a few validation exceptions for peering (marked with Even with the simple topology, Some pointers about where to go from here would be appreciated. |
Yep, the checks in Sorry that I've not managed to give a more detailed feedback on your issues. I'm on leave until mid next week, but I will try to look into it after I return. Maybe someone else can already have a look in the meantime. |
That's already covered by the validation exception that I added ( |
The latest commit contains a patch by @matzf for a tricky regression that apparently got introduced by changing a With this fix, the issues that plagued this PR for the last two months finally seem resolved. Peering links work again (also in more complex topologies). In other words, we should be back on track. 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and appears to be working well now!
Some specific tests for the peering cases in the router would still be required before merging this.
There are two places, and ideally we have cases for both sets of tests: TestProcessPkt
in dataplane_test.go, which is sort of a unit test, and the braccept
router integration tests (see tools/braccept/cases/doc.go).
Would you have capacity work on this?
Reviewed 2 of 11 files at r1, 1 of 3 files at r4, 1 of 27 files at r6, 1 of 39 files at r8, 7 of 59 files at r10, 5 of 7 files at r12, all commit messages.
Reviewable status: 19 of 71 files reviewed, 7 unresolved discussions (waiting on @benthor and @lukedirtwalker)
private/topology/json/json.go
line 115 at r12 (raw file):
MTU int `json:"mtu"` BFD *BFD `json:"bfd,omitempty"` RemoteIFID common.IFIDType `json:"remote_interface_id,omitempty"`
Nit: "official" preferred form is IfID
.
router/dataplane.go
line 989 at r12 (raw file):
// XXX(benthor) disabling check in case of peering packet // is probably insecture if p.path.IsFirstHop() || p.ingressID != 0 || p.peering {
Indeed. Fix this instead by fixing the ingressInterface()
helper function; this now needs to avoid looking back past the segment switch for peering links.
@@ -1166,7 +1172,7 @@
func (p *scionPacketProcessor) ingressInterface() uint16 {
info := p.infoField
hop := p.hopField
- if p.path.IsFirstHopAfterXover() {
+ if !p.peering && p.path.IsFirstHopAfterXover() {
var err error
info, err = p.path.GetInfoField(int(p.path.PathMeta.CurrINF) - 1)
router/dataplane.go
line 1035 at r12 (raw file):
// not fully grasping the implications yet though case ingress == topology.Peer || egress == topology.Peer: return processResult{}, nil
As peering hops are not considered segment changes, this is indeed the right place for these checks. Check for the specific combination of interface types (valley routing through peers is forbidden).
Suggestion:
case ingress == topology.Child && egress == topology.Peer:
return processResult{}, nil
case ingress == topology.Peer && egress == topology.Child:
return processResult{}, nil
router/dataplane.go
line 1059 at r12 (raw file):
return processResult{}, nil case ingress == topology.Peer && egress == topology.Child: return processResult{}, nil
These cases are not hit as peering hop is not considered a segment change here (p.segmentChange == false
). Remove these case
s.
router/control/conf.go
line 57 at r12 (raw file):
IA addr.IA Addr *net.UDPAddr IFID common.IFIDType
IfID
tools/topology/topo.py
line 311 at r12 (raw file):
intf = self._gen_br_intf(remote, public_addr, remote_addr, attrs, remote_type) if intf['link_to'] == 'peer': intf['remote_interface_id'] = r_ifid
Nit: nicer to set this in _gen_br_intf
.
Well-worded question. As much as I would like to work on this I will probably not find the time before end of next month, unfortunately. |
Ok, I've added the "help wanted" label for the meantime. The current status here is: works, but missing tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm restarting this. I merged this PR into a branch of mine so I can work on it. I guess the conversation will have to continue in the PR I'll submit once I have something new to offer.
Reviewable status: 19 of 71 files reviewed, 7 unresolved discussions (waiting on @benthor, @lukedirtwalker, and @matzf)
This is an attempt to address Issue #4093
Since the feature branch referenced by the original PR #4249 wasn't correctly rebased, we'll start over with a clean slate as suggested by @oncilla .
This change is![Reviewable](https://camo.githubusercontent.com/23b05f5fb48215c989e92cc44cf6512512d083132bd3daf689867c8d9d386888/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)