Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protocol: implement peering links #4299

Closed
wants to merge 11 commits into from
Closed

Conversation

benthor
Copy link
Contributor

@benthor benthor commented Nov 25, 2022

This is an attempt to address Issue #4093

Since the feature branch referenced by the original PR #4249 wasn't correctly rebased, we'll start over with a clean slate as suggested by @oncilla .


This change is Reviewable

@benthor
Copy link
Contributor Author

benthor commented Nov 30, 2022

Running bin/end2end_integration succeeds with the included peering-test.topo.

However, some of the traceroute tests run via bin/scion_integration fail. Here is a ping via one of the peering links that succeeds:

echo 0 | ./bin/scion ping -c 1 --sciond $(./scion.sh sciond-addr 113) -i 2-ff00:0:213,127.0.0.1
[...]
Using path:
  Hops: [1-ff00:0:113 3>3 1-ff00:0:112 42>23 2-ff00:0:212 3>3 2-ff00:0:213] MTU: 1472 NextHop: 127.0.0.49:31022
[...]
1 packets transmitted, 1 received, 0% packet loss, time 2.015ms

Doing the same with a traceroute:

echo 0 | ./bin/scion traceroute --sciond $(./scion.sh sciond-addr 113) -i 2-ff00:0:213,127.0.0.1
[...]
Using path:
  Hops: [1-ff00:0:113 3>3 1-ff00:0:112 42>23 2-ff00:0:212 3>3 2-ff00:0:213] MTU: 1472 NextHop: 127.0.0.49:31022

0 1-ff00:0:113,127.0.0.49:0 IfID=3 387µs 230µs 206µs
1 ?? * * *
2 ?? * * *
3 2-ff00:0:213,127.0.0.89:0 IfID=3 1.811ms 1.48ms 1.331ms
Error: packets were lost

Inspecting br1-ff00_0_112-3.log:

DEBUG          router/dataplane.go:483  SCMP    {"err": "scmp {cause=MAC verification failed {actual=fa5182d6f01c; cons_dir=true; curr_hf=2; curr_inf=1; expected=eb0cb3becbbc; if_id=0; seg_id=45995}; typecode=ParameterProblem(InvalidHopFieldMAC)}", "dst_addr": "127.0.0.42:31016"}

So MAC verification for SCMP is still an issue on peering links, at least in traceroute scenarios. Any pointers what I need to look at to make this work?

@benthor benthor force-pushed the peertest branch 3 times, most recently from 0557184 to d47f8d9 Compare December 1, 2022 15:04
@oncilla oncilla requested review from oncilla and matzf December 2, 2022 09:25
@oncilla oncilla changed the title protocol: (re)implement peering links protocol: implement peering links Dec 2, 2022
Copy link
Contributor

@oncilla oncilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So MAC verification for SCMP is still an issue on peering links, at least in traceroute scenarios. Any pointers what I need to look at to make this work?

I would look at the code paths that differ between a SCION and SCMP info packet.
Probably, there is some small bug in the handling of the router.

What also strikes me as odd is that we have 6 interfaces, but there is only 4 output lines.
I would expect one line per interface.
Maybe the bug is also hiding in how the packet is composed

Reviewable status: 0 of 11 files reviewed, 1 unresolved discussion (waiting on @matzf)

Copy link
Contributor

@oncilla oncilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 11 files at r1.
Reviewable status: 3 of 11 files reviewed, 9 unresolved discussions (waiting on @benthor and @matzf)


pkg/slayers/path/scion/base.go line 79 at r2 (raw file):

	if int(s.PathMeta.CurrHF) >= s.NumHops-1 {
		s.PathMeta.CurrHF = uint8(s.NumHops - 1)
		return serrors.New(fmt.Sprintf("path already at end, currhf %d, numhops %d",

use the errCtx of serrors.New
(see https://pkg.go.dev/github.com/scionproto/scion@v0.8.0/pkg/private/serrors#New)

Suggestion:

		return serrors.New("path already at end", "curr_hf", s.PathMeta.CurrHF, "num_hops", s.NumHops)

private/topology/json/json.go line 107 at r2 (raw file):

	MTU        int             `json:"mtu"`
	BFD        *BFD            `json:"bfd,omitempty"`
	RemoteIFID common.IFIDType `json:"remote_if_id,omitempty"`

Suggestion:

RemoteIFID common.IFIDType `json:"remote_interface_id,omitempty"`

router/dataplane.go line 95 at r2 (raw file):

	linkTypes         map[uint16]topology.LinkType
	neighborIAs       map[uint16]addr.IA
	remotePeerIF      map[uint16]uint16

Suggestion:

peerInterfaces      map[uint16]uint16

router/dataplane.go line 257 at r2 (raw file):

// a different type, this method will return an error. This can only
// be called on a not yet running dataplane.
func (d *DataPlane) AddRemotePeer(localifID, remoteifID uint16) error {

Suggestion:

func (d *DataPlane) AddRemotePeer(local, remote uint16) error {

router/dataplane.go line 262 at r2 (raw file):

	}
	if _, exists := d.remotePeerIF[localifID]; exists {
		return serrors.WithCtx(alreadySet, "localifID", localifID)

Suggestion:

return serrors.WithCtx(alreadySet, "local_interface", localifID)

router/dataplane.go line 846 at r2 (raw file):

	}

	// TODO: proper error

Adding a comment, such that we do not forget about this.


router/dataplane.go line 848 at r2 (raw file):

	// TODO: proper error
	err := serrors.New("TODO: segment length error (peering)")
	result := processResult{}

drop this variable and just inline it.


router/dataplane.go line 862 at r2 (raw file):

	// The peer hop fields are the last hop field on the first path
	// segment and the first hop field of the second path segment.
	currHF := p.path.PathMeta.CurrHF

Something about condition feels off.

Based on the ConsDir we can determine whether we need to be the first hop or the last hop on the segment.

Also you are mentioning first or second path segment, but in the code we only check SegLen[0].
I wonder why it still works.

Code quote:

	// The peer hop fields are the last hop field on the first path
	// segment and the first hop field of the second path segment.
	currHF := p.path.PathMeta.CurrHF
	segLen := p.path.PathMeta.SegLen[0]
	p.peering = currHF == segLen-1 || currHF == segLen
	return result, nil

Copy link
Contributor

@oncilla oncilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r2.
Reviewable status: 4 of 11 files reviewed, 9 unresolved discussions (waiting on @benthor and @matzf)

Copy link
Contributor

@oncilla oncilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 4 files at r2.
Reviewable status: 7 of 11 files reviewed, 9 unresolved discussions (waiting on @benthor and @matzf)

Copy link
Contributor Author

@benthor benthor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 of 11 files reviewed, 9 unresolved discussions (waiting on @matzf and @oncilla)


router/dataplane.go line 862 at r2 (raw file):

Previously, oncilla (Dominik Roos) wrote…

Something about condition feels off.

Based on the ConsDir we can determine whether we need to be the first hop or the last hop on the segment.

Also you are mentioning first or second path segment, but in the code we only check SegLen[0].
I wonder why it still works.

I see what you mean, something looks not quite right yet. I'm a bit out of my depth here, because this code was your own contribution in the old PR and the crucial improvement over my own attempts to make peering work again.

Copy link
Contributor Author

@benthor benthor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 11 files reviewed, 8 unresolved discussions (waiting on @matzf and @oncilla)


pkg/slayers/path/scion/base.go line 79 at r2 (raw file):

Previously, oncilla (Dominik Roos) wrote…

use the errCtx of serrors.New
(see https://pkg.go.dev/github.com/scionproto/scion@v0.8.0/pkg/private/serrors#New)

Done.


private/topology/json/json.go line 107 at r2 (raw file):

	MTU        int             `json:"mtu"`
	BFD        *BFD            `json:"bfd,omitempty"`
	RemoteIFID common.IFIDType `json:"remote_if_id,omitempty"`

Done.


router/dataplane.go line 95 at r2 (raw file):

	linkTypes         map[uint16]topology.LinkType
	neighborIAs       map[uint16]addr.IA
	remotePeerIF      map[uint16]uint16

Done.


router/dataplane.go line 257 at r2 (raw file):

// a different type, this method will return an error. This can only
// be called on a not yet running dataplane.
func (d *DataPlane) AddRemotePeer(localifID, remoteifID uint16) error {

Done.


router/dataplane.go line 262 at r2 (raw file):

	}
	if _, exists := d.remotePeerIF[localifID]; exists {
		return serrors.WithCtx(alreadySet, "localifID", localifID)

Done.


router/dataplane.go line 848 at r2 (raw file):

Previously, oncilla (Dominik Roos) wrote…

drop this variable and just inline it.

Done.

Copy link
Contributor

@oncilla oncilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r3.
Reviewable status: 6 of 11 files reviewed, 3 unresolved discussions (waiting on @benthor and @matzf)


router/dataplane.go line 862 at r2 (raw file):

Previously, benthor wrote…

I see what you mean, something looks not quite right yet. I'm a bit out of my depth here, because this code was your own contribution in the old PR and the crucial improvement over my own attempts to make peering work again.

Ah, actually now that I look at it closer, the conditions seems to be right:

currHF == segLen-1 : The last hop on the first segment
currHF == segLen: The first hop on the second segment

We do not need to check the second segment length because it is the first hop on the second segment.

Copy link
Contributor

@oncilla oncilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, my suspicion is that there is not really an issue in the router, but rather in the scion tool.
When doing traceroute, between the direct peers, no packet is even sent.

Reviewable status: 6 of 11 files reviewed, 3 unresolved discussions (waiting on @benthor and @matzf)

Copy link
Contributor

@oncilla oncilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, after fixing traceroute, I got to the following:

root@1793b53d68a3:/share# scion traceroute 2-ff00:0:212,172.20.0.167 -i
Available paths to 2-ff00:0:212:
3 Hops:
[ 0] Hops: [1-ff00:0:113 3>3 1-ff00:0:112 42>23 2-ff00:0:212] MTU: 1472 NextHop: 172.20.0.114:30042
5 Hops:
[ 1] Hops: [1-ff00:0:113 3>3 1-ff00:0:112 1>2 1-ff00:0:110 3>3 2-ff00:0:210 2>7 2-ff00:0:212] MTU: 1400 NextHop: 172.20.0.114:30042
Choose path: 0
Using path:
  Hops: [1-ff00:0:113 3>3 1-ff00:0:112 42>23 2-ff00:0:212] MTU: 1472 NextHop: 172.20.0.114:30042

0 1-ff00:0:113,172.20.0.114 IfID=3 1.111ms 0.753ms 0.706ms
1 ?? * * *
2 1-ff00:0:112,172.20.0.100 IfID=42 1.782ms 1.329ms 1.491ms
3 ?? * * *
Error: packets were lost

Reviewable status: 6 of 11 files reviewed, 3 unresolved discussions (waiting on @benthor and @matzf)

Copy link
Contributor

@oncilla oncilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is related to segments where the peer flag is set, but they are not traversed until the peer interface.
Investigating further

Reviewable status: 6 of 11 files reviewed, 3 unresolved discussions (waiting on @benthor and @matzf)

Copy link
Contributor

@oncilla oncilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

root@1793b53d68a3:/share#  scion traceroute 2-ff00:0:212,172.20.0.167 -i  
Available paths to 2-ff00:0:212:
3 Hops:
[ 0] Hops: [1-ff00:0:113 3>3 1-ff00:0:112 42>23 2-ff00:0:212] MTU: 1472 NextHop: 172.20.0.114:30042
5 Hops:
[ 1] Hops: [1-ff00:0:113 3>3 1-ff00:0:112 1>2 1-ff00:0:110 3>3 2-ff00:0:210 2>7 2-ff00:0:212] MTU: 1400 NextHop: 172.20.0.114:30042
Choose path: 0
Using path:
  Hops: [1-ff00:0:113 3>3 1-ff00:0:112 42>23 2-ff00:0:212] MTU: 1472 NextHop: 172.20.0.114:30042

0 1-ff00:0:113,172.20.0.114 IfID=3 1.148ms 0.683ms 0.828ms
1 1-ff00:0:112,172.20.0.99 IfID=3 1.908ms 1.138ms 1.136ms
2 1-ff00:0:112,172.20.0.100 IfID=42 2.586ms 1.848ms 2.487ms
3 2-ff00:0:212,172.20.0.164 IfID=23 2.708ms 2.239ms 2.095ms

alright, turns out IsXover is a trap. I will add a diff shortly

Reviewable status: 6 of 11 files reviewed, 3 unresolved discussions (waiting on @benthor and @matzf)

Copy link
Contributor

@oncilla oncilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff --git a/pkg/slayers/path/scion/base.go b/pkg/slayers/path/scion/base.go
index f6d7651dc..f0435a22d 100644
--- a/pkg/slayers/path/scion/base.go
+++ b/pkg/slayers/path/scion/base.go
@@ -84,7 +84,11 @@ func (s *Base) IncPath() error {
 	return nil
 }
 
-// IsXover returns whether we are at a crossover point.
+// IsXover returns whether we are at a crossover point. This includes all
+// segment switches, even over a peering link. Note that handling of a regular
+// segment switch and handling of a segment switch over a peering link are
+// fundamentally different. To distinguish the two, you will need to extract
+// the information from the info field.
 func (s *Base) IsXover() bool {
 	return s.PathMeta.CurrINF != s.infIndexForHF(s.PathMeta.CurrHF+1)
 }
diff --git a/router/dataplane.go b/router/dataplane.go
index 303e3ea41..d3054f94e 100644
--- a/router/dataplane.go
+++ b/router/dataplane.go
@@ -983,12 +983,15 @@ func (p *scionPacketProcessor) verifyCurrentMAC() (processResult, error) {
 				slayers.SCMPCodeInvalidHopFieldMAC),
 			},
 			&slayers.SCMPParameterProblem{Pointer: p.currentHopPointer()},
-			serrors.New("MAC verification failed", "expected", fmt.Sprintf(
-				"%x", fullMac[:path.MacLen]),
+			serrors.New("MAC verification failed",
+				"expected", fmt.Sprintf("%x", fullMac[:path.MacLen]),
 				"actual", fmt.Sprintf("%x", p.hopField.Mac[:path.MacLen]),
 				"cons_dir", p.infoField.ConsDir,
-				"if_id", p.ingressID, "curr_inf", p.path.PathMeta.CurrINF,
-				"curr_hf", p.path.PathMeta.CurrHF, "seg_id", p.infoField.SegID),
+				"if_id", p.ingressID,
+				"curr_inf", p.path.PathMeta.CurrINF,
+				"curr_hf", p.path.PathMeta.CurrHF,
+				"seg_id", p.infoField.SegID,
+			),
 		)
 	}
 	// Add the full MAC to the SCION packet processor,
@@ -1513,7 +1516,7 @@ func (p *scionPacketProcessor) prepareSCMP(scmpH *slayers.SCMP, scmpP gopacket.S
 	revPath := revPathTmp.(*scion.Decoded)
 
 	// Revert potential path segment switches that were done during processing.
-	if revPath.IsXover() {
+	if revPath.IsXover() && !p.peering {
 		if err := revPath.IncPath(); err != nil {
 			return nil, serrors.Wrap(cannotRoute, err, "details", "reverting cross over for SCMP")
 		}
@@ -1523,7 +1526,7 @@ func (p *scionPacketProcessor) prepareSCMP(scmpH *slayers.SCMP, scmpP gopacket.S
 	_, external := p.d.external[p.ingressID]
 	if external {
 		infoField := &revPath.InfoFields[revPath.PathMeta.CurrINF]
-		if infoField.ConsDir {
+		if infoField.ConsDir && !p.peering {
 			hopField := revPath.HopFields[revPath.PathMeta.CurrHF]
 			infoField.UpdateSegID(hopField.Mac)
 		}
diff --git a/scion/traceroute/traceroute.go b/scion/traceroute/traceroute.go
index 8c9c75251..175a58a87 100644
--- a/scion/traceroute/traceroute.go
+++ b/scion/traceroute/traceroute.go
@@ -145,6 +145,7 @@ func (t *tracerouter) Traceroute(ctx context.Context) (Stats, error) {
 		defer log.HandlePanic()
 		t.drain(ctx)
 	}()
+
 	prevXover := false
 	for i := 0; i < len(idxPath.HopFields); i++ {
 		hf := idxPath.PathMeta.CurrHF
@@ -162,7 +163,9 @@ func (t *tracerouter) Traceroute(ctx context.Context) (Stats, error) {
 				t.updateHandler(u)
 			}
 		}
-		xover := idxPath.IsXover()
+		// Peering links do not count as regular cross overs. For peering
+		// links we probe all interfaces on the path.
+		xover := idxPath.IsXover() && !info.Peer
 		// The last hop of the path isn't probed, only the ingress interface is
 		// relevant.
 		// At a crossover (segment change) only the ingress interface is

Reviewable status: 6 of 11 files reviewed, 2 unresolved discussions (waiting on @benthor and @matzf)

@benthor
Copy link
Contributor Author

benthor commented Dec 9, 2022

An odd bug keeps triggering on integration test. I've changed peering-test.topo to also trigger it when running ./bin/scion_integration. Here is what I've found so far:

  • MAC validation can fail when an AS has more than one peering link.
  • SCMP Errors are only reported by the border router(s) of the AS with the multiple peering links.
  • In some SCMP Errors, curr_hf, curr_inf and if_id are reported as 0.

Copy link
Contributor Author

@benthor benthor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 12 files reviewed, 2 unresolved discussions (waiting on @matzf and @oncilla)


router/dataplane.go line 862 at r2 (raw file):

Previously, oncilla (Dominik Roos) wrote…

Ah, actually now that I look at it closer, the conditions seems to be right:

currHF == segLen-1 : The last hop on the first segment
currHF == segLen: The first hop on the second segment

We do not need to check the second segment length because it is the first hop on the second segment.

Done.

Copy link
Contributor

@oncilla oncilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 11 files at r1, 1 of 4 files at r3, 3 of 3 files at r4, 1 of 1 files at r5.
Reviewable status: all files reviewed (commit messages unreviewed), 1 unresolved discussion (waiting on @benthor and @matzf)

Copy link
Contributor

@oncilla oncilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need a bit more information.

  • What changes to the topology did you do?
  • What tool fails: ping vs traceroute vs showpaths?
  • Between which hosts is the failure

Reviewable status: all files reviewed (commit messages unreviewed), 1 unresolved discussion (waiting on @benthor and @matzf)

@benthor
Copy link
Contributor Author

benthor commented Dec 12, 2022

I think I need a bit more information.

* What changes to the topology did you do?

My latest commit changed the following:

diff --git a/topology/peering-test.topo b/topology/peering-test.topo
index 40aed961f..b32873948 100644
--- a/topology/peering-test.topo
+++ b/topology/peering-test.topo
@@ -33,6 +33,7 @@ links:
   - {a: "1-ff00:0:110#2", b: "1-ff00:0:112#1", linkAtoB: CHILD, bw: 500}
   - {a: "1-ff00:0:112#3", b: "1-ff00:0:113#3", linkAtoB: CHILD, bw: 500}
   - {a: "1-ff00:0:113#4", b: "1-ff00:0:114#4", linkAtoB: CHILD, bw: 500}
+  - {a: "1-ff00:0:112#47", b: "1-ff00:0:114#11", linkAtoB: PEER}
   - {a: "1-ff00:0:110#3", b: "2-ff00:0:210#3", linkAtoB: CORE}
   - {a: "2-ff00:0:210#1", b: "2-ff00:0:211#3", linkAtoB: CHILD, mtu: 1280}
   - {a: "2-ff00:0:210#2", b: "2-ff00:0:212#7", linkAtoB: CHILD, bw: 500}
   - {a: "2-ff00:0:212#3", b: "2-ff00:0:213#3", linkAtoB: CHILD, bw: 500}
   - {a: "1-ff00:0:112#42", b: "2-ff00:0:212#23", linkAtoB: PEER}
* What tool fails: ping vs traceroute vs showpaths?

Ping as well as traceroute fail. It's easiest to verify when running bin/end2end_integration and/or bin/scion_integration.

* Between which hosts is the failure

Between any hosts that select a path via one of the two peering links offered by AS -112. Example:

echo "0" | ./bin/scion ping -c 1 --sciond "$(./scion.sh sciond-addr 213)" -i 1-ff00:0:114,127.0.0.1
Available paths to 1-ff00:0:114:
5 Hops:
[ 0] Hops: [2-ff00:0:213 3>3 2-ff00:0:212 23>42 1-ff00:0:112 3>3 1-ff00:0:113 4>4 1-ff00:0:114] MTU: 1472 NextHop: 127.0.0.97:31056
7 Hops:
[ 1] Hops: [2-ff00:0:213 3>3 2-ff00:0:212 7>2 2-ff00:0:210 3>3 1-ff00:0:110 2>1 1-ff00:0:112 3>3 1-ff00:0:113 4>4 1-ff00:0:114] MTU: 1400 NextHop: 127.0.0.97:31056
Choose path: Resolved local address:
  127.0.0.1
Using path:
  Hops: [2-ff00:0:213 3>3 2-ff00:0:212 23>42 1-ff00:0:112 3>3 1-ff00:0:113 4>4 1-ff00:0:114] MTU: 1472 NextHop: 127.0.0.97:31056

PING 1-ff00:0:114,127.0.0.1:0 pld=0B scion_pkt=124B
ERROR: not SCMPEchoReply {type=snet.SCMPParameterProblem}

--- 1-ff00:0:114,127.0.0.1 statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 1.869ms
Error: no reply packet received

@benthor
Copy link
Contributor Author

benthor commented Dec 12, 2022

Addendum: The bug has an interesting side-effect on scion showpaths: In the working topology (with just one peering link), a path over the peering link is marked as "alive"

./bin/scion showpaths --sciond "$(./scion.sh sciond-addr 213)" 1-ff00:0:114
Available paths to 1-ff00:0:114
5 Hops:
[0] Hops: [2-ff00:0:213 3>3 2-ff00:0:212 23>42 1-ff00:0:112 3>3 1-ff00:0:113 4>4 1-ff00:0:114] MTU: 1472 NextHop: 127.0.0.89:31052 Status: alive LocalIP: 127.0.0.1
7 Hops:
[1] Hops: [2-ff00:0:213 3>3 2-ff00:0:212 7>2 2-ff00:0:210 3>3 1-ff00:0:110 2>1 1-ff00:0:112 3>3 1-ff00:0:113 4>4 1-ff00:0:114] MTU: 1400 NextHop: 127.0.0.89:31052 Status: alive LocalIP: 127.0.0.1

With an additional peering link on AS -112 enabled, showpaths reports the status of the same peering link as "timeout":

./bin/scion showpaths --sciond "$(./scion.sh sciond-addr 213)" 1-ff00:0:114
Available paths to 1-ff00:0:114
5 Hops:
[0] Hops: [2-ff00:0:213 3>3 2-ff00:0:212 23>42 1-ff00:0:112 3>3 1-ff00:0:113 4>4 1-ff00:0:114] MTU: 1472 NextHop: 127.0.0.97:31056 Status: timeout LocalIP: 127.0.0.1
7 Hops:
[1] Hops: [2-ff00:0:213 3>3 2-ff00:0:212 7>2 2-ff00:0:210 3>3 1-ff00:0:110 2>1 1-ff00:0:112 3>3 1-ff00:0:113 4>4 1-ff00:0:114] MTU: 1400 NextHop: 127.0.0.97:31056 Status: alive LocalIP: 127.0.0.1

@benthor
Copy link
Contributor Author

benthor commented Dec 16, 2022

Unrelated question: should I rebase or merge from master at this point or would that just complicated matters?

@matzf
Copy link
Collaborator

matzf commented Jan 11, 2023

Unrelated question: should I rebase or merge from master at this point or would that just complicated matters?

Generally, feel free to merge or rebase at any point. The Reviewable tool is very robust and does not easily get confused. The rebase-and-squash merge workflow that we use in this repository means that you don't need to worry about a clean commit history in your branch.

The merge conflicts that currently come up only come from moving the function calls for validateHopExpiry and validateIngressID out from parsePath (see #4157). This should be straight forward to resolve.

Btw. I've looked at the problem that show up with the changed peering topology, but haven't been able to figure out what is going on yet. I'll try to look into this again and get back to you by next week.

@benthor benthor requested a review from a team as a code owner January 23, 2023 16:42
@benthor
Copy link
Contributor Author

benthor commented Jan 23, 2023

I reverted peering-test.topo to the simple topology with the single, "working" peering link and created peering-test-multi.topo with multiple peering links that demonstrates the above-mentioned problems.

Not sure if rebasing at this point wasn't a mistake. Some chaos couldn't be avoided, plus I had to make a few validation exceptions for peering (marked with XXX(benthor) in the code).

Even with the simple topology, bin/end2end_integration now fails with "no path found" across certain endpoint combinations, although bin/scion_integration succeeds and manually testing peering links (via, e.g., echo "0" | ./bin/scion ping -c 1 --sciond "$(./scion.sh sciond-addr 213)" -i 1-ff00:0:114,127.0.0.1) also yields 0% packet loss.

Some pointers about where to go from here would be appreciated.

@matzf
Copy link
Collaborator

matzf commented Jan 24, 2023

I had to make a few validation exceptions for peering (marked with XXX(benthor) in the code).

Yep, the checks in validateEgressID look right to. Shouldn't we also accept the case ingress == topology.Peer && egress == topology.Unset: though?
I think we shouldn't need the blanket exception for peering hops in validateTransitUnderlaySrc, but perhaps the logic to determine the ingressInterface() needs to be adapted for peer hops.

Sorry that I've not managed to give a more detailed feedback on your issues. I'm on leave until mid next week, but I will try to look into it after I return. Maybe someone else can already have a look in the meantime.

@benthor
Copy link
Contributor Author

benthor commented Jan 24, 2023

Yep, the checks in validateEgressID look right to. Shouldn't we also accept the case ingress == topology.Peer && egress == topology.Unset: though?

That's already covered by the validation exception that I added (case ingress == topology.Peer || egress == topology.Peer:) As mentioned in the comment, my understanding of the actual implications of the different possible combinations is limited, so the above might be too permissive in certain cases.

@benthor
Copy link
Contributor Author

benthor commented Feb 10, 2023

The latest commit contains a patch by @matzf for a tricky regression that apparently got introduced by changing a PeerEntry struct field from pointer to value and thereby breaking the semantics of a critical for loop body. See also go discussion #56010.

With this fix, the issues that plagued this PR for the last two months finally seem resolved. Peering links work again (also in more complex topologies).

In other words, we should be back on track. 🎉

Copy link
Collaborator

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and appears to be working well now!

Some specific tests for the peering cases in the router would still be required before merging this.
There are two places, and ideally we have cases for both sets of tests: TestProcessPkt in dataplane_test.go, which is sort of a unit test, and the braccept router integration tests (see tools/braccept/cases/doc.go).
Would you have capacity work on this?

Reviewed 2 of 11 files at r1, 1 of 3 files at r4, 1 of 27 files at r6, 1 of 39 files at r8, 7 of 59 files at r10, 5 of 7 files at r12, all commit messages.
Reviewable status: 19 of 71 files reviewed, 7 unresolved discussions (waiting on @benthor and @lukedirtwalker)


private/topology/json/json.go line 115 at r12 (raw file):

	MTU        int             `json:"mtu"`
	BFD        *BFD            `json:"bfd,omitempty"`
	RemoteIFID common.IFIDType `json:"remote_interface_id,omitempty"`

Nit: "official" preferred form is IfID.


router/dataplane.go line 989 at r12 (raw file):

	// XXX(benthor) disabling check in case of peering packet
	// is probably insecture
	if p.path.IsFirstHop() || p.ingressID != 0 || p.peering {

Indeed. Fix this instead by fixing the ingressInterface() helper function; this now needs to avoid looking back past the segment switch for peering links.

@@ -1166,7 +1172,7 @@
 func (p *scionPacketProcessor) ingressInterface() uint16 {
        info := p.infoField
        hop := p.hopField
-       if p.path.IsFirstHopAfterXover() {
+       if !p.peering && p.path.IsFirstHopAfterXover() {
                var err error
                info, err = p.path.GetInfoField(int(p.path.PathMeta.CurrINF) - 1)

router/dataplane.go line 1035 at r12 (raw file):

		//        not fully grasping the implications yet though
		case ingress == topology.Peer || egress == topology.Peer:
			return processResult{}, nil

As peering hops are not considered segment changes, this is indeed the right place for these checks. Check for the specific combination of interface types (valley routing through peers is forbidden).

Suggestion:

		case ingress == topology.Child && egress == topology.Peer:
			return processResult{}, nil
		case ingress == topology.Peer && egress == topology.Child:
			return processResult{}, nil

router/dataplane.go line 1059 at r12 (raw file):

		return processResult{}, nil
	case ingress == topology.Peer && egress == topology.Child:
		return processResult{}, nil

These cases are not hit as peering hop is not considered a segment change here (p.segmentChange == false). Remove these cases.


router/control/conf.go line 57 at r12 (raw file):

	IA   addr.IA
	Addr *net.UDPAddr
	IFID common.IFIDType

IfID


tools/topology/topo.py line 311 at r12 (raw file):

        intf = self._gen_br_intf(remote, public_addr, remote_addr, attrs, remote_type)
        if intf['link_to'] == 'peer':
            intf['remote_interface_id'] = r_ifid

Nit: nicer to set this in _gen_br_intf.

@benthor
Copy link
Contributor Author

benthor commented Jul 3, 2023

Would you have capacity work on this?

Well-worded question. As much as I would like to work on this I will probably not find the time before end of next month, unfortunately.

@matzf matzf added the i/help wanted Nobody is working on this at the moment label Jul 3, 2023
@matzf
Copy link
Collaborator

matzf commented Jul 3, 2023

Ok, I've added the "help wanted" label for the meantime. The current status here is: works, but missing tests.
Whoever starts working on this, please coordinate in this PR.

Copy link
Contributor

@jiceatscion jiceatscion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm restarting this. I merged this PR into a branch of mine so I can work on it. I guess the conversation will have to continue in the PR I'll submit once I have something new to offer.

Reviewable status: 19 of 71 files reviewed, 7 unresolved discussions (waiting on @benthor, @lukedirtwalker, and @matzf)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i/help wanted Nobody is working on this at the moment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants