Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scionlab colibri feature #101

Merged
merged 4 commits into from
Oct 14, 2021

Conversation

juagargi
Copy link
Member

@juagargi juagargi commented Jul 29, 2021

This PR introduces the COLIBRI service.
The changes in this PR come from several long lived branches, and are not yet completely cleaned up.
I'm already opening the PR up as a draft to start with the review process, and to let the members of the SCIONLab team to check possible dangerous points present in the PR.
A basic example of a client can be found on netsec-ethz/scion-apps#197

The PR is in draft until the following is completed:

  • Rebase on latest scionlab branch, solve conflicts
  • Remove any deleteme notes
  • Some parts of the control service are still living under go/cs/, and need to be moved
  • Remove any deprecated TODO notes. Review rest of them
  • Dataplane still has issues
  • Client basic API
  • Allow empty configuration for the colibri service
  • Integration test
  • Pass integration tests with all local topologies, in parallel
  • Defects document https://docs.google.com/document/d/13DG2vEcZP1CsjTikASKb8TmQ0f-vBoX4XJEzhX85yXw/edit?usp=sharing
  • Test with SvcCOL located on the same machine as the CS
  • Test deploy at two machines in scionlab, with configuration from Coordinator (seg. reservations between them).

The colibri service (control plane) has been tested for a while already, but still some bugs can (probably will) be found there. It should not be critical for the health of the rest of SCIONLab if only the colibri service goes down, although undesirable.

I would like when possible a thorough review on:

  • Checks on safe or no modification to CS and BR. In general, anything not related to COLIBRI should remain as stable as possible. This would apply to their configuration as well.
  • Break control plane of COLIBRI under normal operation. I.e. correct/valid configuration
  • Add a command to list reservations

This change is Reviewable

Copy link

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff!

Before diving into the details, I would first like to discuss the design of daemon API (briefly quickly in the chat); the current API seems low level and directly passes through to the CS. My understanding is that there will be a more user friendly API that wraps this. This would be implemented in a library, instead of in the daemon. Wouldn't it be feasible to expose the user friendly API on the daemon, instead of the low level one, and move the logic from the library to the daemon?
This would seem to be more in line with the design of the normal Path querying functionality, where the queries multiple segments and returns the combined paths. The advantages, imho, would be that application library can be lighter, and that applications written without our libraries (e.g. implemented in different language) can still benefit from the higher-level functionality. Note that the low-level functionality is still available to applications by bypassing the daemon and accessing the APIs of the CS directly.

Note: this can of course be implemented as part of some (API breaking) refactoring.

Reviewable status: 0 of 192 files reviewed, all discussions resolved / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained

Copy link
Member Author

@juagargi juagargi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for already starting to look into this!
Also thanks for the insights in your comments, they are very helpful.
I just included a changeset with an extended API in the form of some calls to a library. Let me address here . the questions you asked:

Wouldn't it be feasible to expose the user friendly API on the daemon, instead of the low level one, and move the logic from the library to the daemon?

It would be possible to move some things from the library to the daemon. How convenient this is, would be up for debate. Let me put two examples:

  • The extended API allows to provide sorting functions to select the best reservation. Moving this to the daemon wouldn't allow the user to provide their own code to do whatever they want to sort the reservations.
  • The extended API allows for an auto renewal of the reservation. This could be moved to the daemon space, but with some caveats:
    • The callback onError can be called from the client-side of the daemon communication, but this means we will nevertheless have to have a thread in this space, to allow for the callback at any moment (inversion of control / event)
    • The daemon is a common space for all applications, and it might be not the right place to perform a control task in behalf of the application.
    • The positive side of moving it would be to unify all renewal calls that the host makes to the CO. This could allow the daemon to accept or deny reservations itself, by means of distinguishing "priorities" among applications. Sort of having a "dispatcher" (shivers 😨 ) of bandwidth.

This would seem to be more in line with the design of the normal Path querying functionality, where the queries multiple segments and returns the combined paths. The advantages, imho, would be that application library can be lighter, and that applications written without our libraries (e.g. implemented in different language) can still benefit from the higher-level functionality.

True, it would be more in line with the current path querying functions, which is a plus. Having the daemon running combine for the applications is, maybe, suboptimal, as there is no way to provide decision-making code to sort the paths/reservations.
While I see the cache in the daemon as a benefit, I don't share the opinion that combining the parts inside the daemon is a good idea; unless we need to trim the sets of the parts (e.g. too many segments) because it's very expensive to send them.
Additionally, moving the logic from the daemon to the application makes the daemon simpler (at the expense of the application). This could be good to e.g. reduce the number of bugs, etc.
I see the point of having a light application library, but not to the expense to move the complexity to a daemon, to which the app communicates via RPC. Also, while this would simplify the porting of that code, it might introduce difficulties into the RPC mechanism, events, etc. I guess that it will depend on the language and tools at our disposal.

Of course nothing of this has to be written in stone. We can definitely provide more functionality in the daemon, and maybe remove it in the library. At this point, we don't even know how the API will be really used.

In order to move easily forward, I propose to open an issue that documents the pros/cons of both approaches, and start weighing them for the next iteration of the API.

Reviewable status: 0 of 200 files reviewed, all discussions resolved / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained

Copy link

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All clear. In that case, I think it would make sense to strip this out of the daemon entirely for now. This is currently directly proxying the requests to the CO and this only seems to add complexity and overhead with no tangible benefits.

Reviewed 2 of 122 files at r5, 1 of 29 files at r6, 3 of 147 files at r7.
Reviewable status: 5 of 200 files reviewed, 3 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


go.mod, line 46 at r8 (raw file):

	github.com/vishvananda/netns v0.0.0-20170219233438-54f0e4339ce7 // indirect
	github.com/xeipuuv/gojsonschema v1.2.0
	go.etcd.io/etcd v0.0.0-20191023171146-3cf2f69b5738

Unused.


scion.sh, line 59 at r8 (raw file):

    tar -kxf bazel-bin/scion.tar -C bin
    tar -kxf bazel-bin/scion-ci.tar -C bin
}

This is equivalent to make .


proto/discovery/v1/discovery.proto, line 74 at r9 (raw file):

message ColibriServicesResponse {
    repeated string address = 1;

There seem to be two ways to "discover" colibri service; the service address and this discovery service protocol. Isn't this redundant?

Btw, does this really need to return multiple addresses here?

Copy link

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 5 files at r1, 3 of 25 files at r2, 4 of 122 files at r5, 1 of 29 files at r6, 16 of 147 files at r7, 2 of 28 files at r9, 1 of 6 files at r10.
Reviewable status: 34 of 200 files reviewed, 21 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


doc/protocols/scion-header.rst, line 806 at r10 (raw file):

        \text{TsRel} &= \text{max} \left\{0,
            \frac{\text{Ts - Timestamp}_{ns}}
            {\text{q}} -1 \right\} \\

I don't understand why we bother with the -1 here? Packets sent at time 0 of the reservation are valid, right? So why the special case? The range can cover the 16s either way.


go/lib/addr/host.go, line 289 at r10 (raw file):

		return "SIG"
	case SvcCOL:
		return "COL"

Nit; I think SvcColibri and "Colibri" would be nicer.


go/lib/colibri/colibri.go, line 40 at r10 (raw file):

	// clockSkewMs denotes the maximal clock skew in milliseconds
	clockSkewMs   uint16 = 1000
	clockSkewNano uint64 = uint64(clockSkewMs) * 1000000

Should use time.Duration type:

packetLifetime = 2 * time.Second
clockSkew  = time.Second

go/lib/colibri/colibri.go, line 57 at r10 (raw file):

	// |    CoreID     |                  CoreCounter                  |
	// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	b := make([]byte, 8)

This could be b := [8]byte{} instead (to avoid the heap allocation, hopefully).


go/lib/colibri/colibri.go, line 59 at r10 (raw file):

	b := make([]byte, 8)
	binary.BigEndian.PutUint32(b[4:8], coreCounter)
	binary.BigEndian.PutUint16(b[3:5], uint16(coreID))

This bit confused me a lot. It seems to be correct, but overwriting byte 4 with the coreID and then overwriting byte 3 with the timestamp was really surprising.
Maybe this could be changed to packedId := (coreCounter & 0xff ff ff) | (coreID & 0xff << 24). Then we can even invoke the CreateColibriTimestampCustom and avoid the repeated implementation.


go/lib/colibri/colibri.go, line 61 at r10 (raw file):

	binary.BigEndian.PutUint16(b[3:5], uint16(coreID))
	binary.BigEndian.PutUint32(b[:4], tsRel)
	packetTimestamp = binary.BigEndian.Uint64(b[:8])

This seems very awkward; this value is now byte-reversed this packet timestamp in an uint64 but then is reversed again when it's actually put into the packet. Same for parsing the packet (byte reversed on parsing) and then again when parsing the timestamp Perhaps it would be clearer to use a separate type for this packet timestamp instead, type PacketTimestamp [8]byte.


go/lib/colibri/colibri.go, line 82 at r10 (raw file):

// ParseColibriTimestamp reads tsRel, coreID, and coreCounter from the packetTimestamp.
func ParseColibriTimestamp(packetTimestamp uint64) (tsRel uint32, coreID uint8,

This function should never be used on incoming packets because there is no requirement to create the timestamp with this default coreID/coreCounter split. I'd remove this function, and rename ParseColibriTimestampCustom to ParseColibriTimestamp.


go/lib/colibri/colibri.go, line 121 at r10 (raw file):

	diff := nowNano - timestampNano
	tsRel := max(0, (diff/4)-1)
	return uint32(tsRel), nil

Use time.Time:

expiration := time.Unix(4 * expirationTick, 0)
earliest := expiration.Add(-16 * time.Second)
now := time.Now()
if now.After(expiration) { ... }
if now.Before(earliest) { ... }
diff := now.Sub(earliest)
return diff.Nanoseconds() / 4 // See comment in scion-header.rst -- why even bother with the -1.

go/lib/colibri/colibri.go, line 138 at r10 (raw file):

func VerifyTimestamp(expirationTick uint32, packetTimestamp uint64) bool {
	nowNano := uint64(time.Now().UnixNano())
	timestampNano := (4*uint64(expirationTick) - 16) * 1000000000

This should again use time.Time/Duration types.

now := time.Now()
timestamp := time.Unix(4 * expirationTick - 16), 0)
if now.Before(timestamp.Add(-clockSkew)) { ... }

go/lib/sciond/sciond.go, line 111 at r10 (raw file):

	// ColibriCleanupRsv cleans an E2E reservation. The ID must be E2E compliant.
	// This method may return an E2EResponseError.
	ColibriCleanupRsv(ctx context.Context, req *reservation.ID, index reservation.IndexNumber) error

As commented in the top-level discussion, let's drop this API from the daemon and directly send the requests to the colibri service.


go/lib/scrypto/mac.go, line 36 at r10 (raw file):

var (
	hfMacSalt   = []byte("Derive OF Key")
	ColibriSalt = []byte("Derive Colibri Key")

Should not be exported.


go/lib/scrypto/mac.go, line 72 at r10 (raw file):

// defaults used by pycrypto.
func DeriveHFMacKey(key []byte) ([]byte, error) {
	return pbkdf2.Key(key, hfMacSalt, 1000, 16, sha256.New), nil

Remove the error return if this never errors. Previously, this had a check for the length of the key with a panic (assertion) -- maybe that can be added back instead of the error.


go/pkg/router/colibri_processing.go, line 39 at r10 (raw file):

	scionLayer slayers.SCION
	// origPacket is the raw original packet, must not be modified.
	origPacket []byte

Unused (an the corresponding field in the "main" packet processor has been removed on master).


go/pkg/router/colibri_processing.go, line 66 at r10 (raw file):

	// Forward the packet to the correct entity
	return c.forward()
}

I really like how straight forward and readable this validation and forwarding logic is!

What is missing is SCMP errors for invalid packets and for the internal/external interface down cases (should at least be mentioned in a TODO). Note that there have been some changes in master/scionlab related to this, so perhaps you want look at this after rebasing.


go/pkg/router/colibri_processing.go, line 189 at r10 (raw file):

	if c.colibriPathMinimal.CurrHopField == nil {
		return 0, serrors.New("colibri hop field must not be nil")
	}

Why all these checks? Programming error, just let it crash (i.e. just access the fields, or check and panic).


go/pkg/router/colibri_processing.go, line 270 at r10 (raw file):

func (c *colibriPacketProcessor) destinedToLocalHost(egressId uint16) bool {
	isLast, _ := c.colibriPathMinimal.IsLastHop()
	return c.scionLayer.DstIA.Equal(c.d.localIA) && egressId == 0 && isLast

All these checks should be redundant if the packet is valid. Explicit validation of this is missing, and I believe this is also not covered by cryptographic check. Now a packet that would fail one of these checks would appear to land in a weird edge case (attempted to be forwarded).


go/pkg/router/dataplane.go, line 100 at r10 (raw file):

	svc               *services
	macFactory        func() hash.Hash
	ColibriKey        []byte

Should not be exported.


go/pkg/router/dataplane_test.go, line 1310 at r10 (raw file):

			mockMsg: func(afterProcessing bool, expTick, tsRel uint32) *ipv4.Message {
				spkt, cpath := prepColibriBaseMsg(false, false, false, 4, 5, expTick, tsRel)
				dst := &net.IPAddr{IP: net.ParseIP("10.0.0.3").To4()}

I could be mistaken, but I think you don't need these To4() calls everywhere.

Copy link
Member Author

@juagargi juagargi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I still don't have it in the daemon 🤦‍♂️ , there should be a cache of segment reservations in it, very much like drkeys or path segments. We don't expect the segment reservations to change much over time, so this cache is highly beneficial. I haven't had the time yet to properly design it, that's why the daemon is not doing anything yet.

Reviewable status: 34 of 200 files reviewed, 20 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


go.mod, line 46 at r8 (raw file):

Previously, matzf (Matthias Frei) wrote…

Unused.

Done.


scion.sh, line 59 at r8 (raw file):

Previously, matzf (Matthias Frei) wrote…

This is equivalent to make .

Ups, got there after a rebase on a biweekly update. Gone.


go/lib/addr/host.go, line 289 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

Nit; I think SvcColibri and "Colibri" would be nicer.

I just followed the gist of the existing names.


go/lib/sciond/sciond.go, line 111 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

As commented in the top-level discussion, let's drop this API from the daemon and directly send the requests to the colibri service.

I'm not in favor of dropping it, as I mentioned at the top level comment: there will be a cache for the segment reservations in the daemon, which should be used, and rate limit the queries to the colibri service.


go/lib/scrypto/mac.go, line 36 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

Should not be exported.

Done.


go/pkg/router/dataplane.go, line 100 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

Should not be exported.

Done.


proto/discovery/v1/discovery.proto, line 74 at r9 (raw file):
The service address discovery via anycast will not work. I found out about that when the colibri service shares the same machine as the control service.
After a discussion with Sam and Dominik, the outcome is that we should drop as much as possible the anycasts in favor for the discovery service.

Btw, does this really need to return multiple addresses here?

It looks to me like an appropriate signature (e.g. same as hidden paths), although we are always returning our only service.

Copy link

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I understand. As long as the cache is not there though, this is just unnecessary bloat that we'll have to maintain. As the APIs are not frozen yet anyway, I don't see an advantage in adding this placeholder. I'd suggest to remove it, maybe store a patch that you can apply to get started when you want to add it back.

Reviewed 1 of 122 files at r5, 1 of 6 files at r10.
Reviewable status: 36 of 200 files reviewed, 19 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


go/lib/addr/host.go, line 289 at r10 (raw file):

Previously, juagargi (Juan A. Garcia Pardo) wrote…

I just followed the gist of the existing names.

The existing names are acronyms , COL is not.


proto/discovery/v1/discovery.proto, line 74 at r9 (raw file):

Previously, juagargi (Juan A. Garcia Pardo) wrote…

The service address discovery via anycast will not work. I found out about that when the colibri service shares the same machine as the control service.
After a discussion with Sam and Dominik, the outcome is that we should drop as much as possible the anycasts in favor for the discovery service.

Btw, does this really need to return multiple addresses here?

It looks to me like an appropriate signature (e.g. same as hidden paths), although we are always returning our only service.

Ok, I really do not agree but I see this is a more general topic. Either way, why still have the anycast address then?

Copy link

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 33 of 200 files reviewed, 28 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


go/lib/colibri/colibri.go, line 180 at r10 (raw file):

	}

	if !bytes.Equal(mac[:4], currHop.Mac[:4]) {

subtle.ConstantTimeCompare(mac[:4], currHop.Mac)


go/lib/colibri/colibri.go, line 207 at r10 (raw file):

	// Initialize cryptographic MAC function
	f, err := initColibriMac(privateKey)

This MAC function should be reused (in particular for the VerifyMAC use in the router), like for the hop field MAC. Initializing an AES cipher allocates memory and does some work to expand the keys.


go/lib/colibri/colibri.go, line 219 at r10 (raw file):

		return nil, serrors.New("colibri static mac input has invalid length", "expected", 16,
			"is", len(input))
	}

Programming error; no need to check, just let it panic in CryptBlocks.


go/lib/colibri/colibri.go, line 221 at r10 (raw file):

	}
	// Calculate CBC-MAC = first 4 bytes of the last CBC block
	mac := make([]byte, len(input))

Avoid this allocation and the allocation for the input buffer (in particular for the VerifyMAC use in the router).; let the caller provide a buffer and encrypt in-place, similar to how this is done for the HF MAC (improvements on master). Same applies to the other CalculateColibriMac* and prepareMacInput* functions.


go/lib/colibri/colibri.go, line 223 at r10 (raw file):

16

Many repetitions of this 16, maybe nicer to define const blockSize = 16 (or use aes.BlockSize).


go/lib/colibri/colibri.go, line 248 at r10 (raw file):

// CalculateColibriMacPacket calculates the per-packet colibri MAC.
func CalculateColibriMacPacket(auth []byte, packetTimestamp uint64,

Can we include the function CalculateColibriMacSigma in CalculateColibriMacPacket? E.g.. keep it two separate local helper functions, but the public function only needs to be available for the combined functionality (nothing is caching the intermediate result anyway).


go/lib/colibri/colibri.go, line 274 at r10 (raw file):

// buffer is expected to be at least `LengthInputData` bytes long.
// suffix is expected to be at most 12 byte long.
func MACInput(buffer []byte, suffix []byte, expTick uint32,

I'm confused here, why is there another way to compute the MAC? Isn't MACInput + StaticMAC the same as CalculateColibriMacStatic? What is the difference between MACInput and prepareMacInputStatic?


go/lib/colibri/colibri.go, line 337 at r10 (raw file):

	dstLen := len(s.RawDstAddr)
	consistent := (4*(int(s.DstAddrLen)+1) == dstLen) &&
		(4*(int(s.SrcAddrLen)+1) == srcLen)

Shouldn't this rather be checked when parsing the SCION packet (I guess it already is). If this is just an assertion, panic instead of returning an error.


go/lib/colibri/colibri.go, line 350 at r10 (raw file):

	// up to the next multiple of 16 bytes
	bufLen := LengthInputData + 1 + srcLen + dstLen
	nrBlocks := uint8(math.Ceil(float64(bufLen) / 16))

nrBlocks := (bufLen + blockSize - 1) / blockSize, no float.


go/lib/colibri/colibri.go, line 390 at r11 (raw file):

// prepareInputData writes InputData to the given buffer.
func prepareInputData(srcAS addr.AS, inf *colibri.InfoField,
	hop *colibri.HopField, buffer []byte) error {

I got a bit lost here with the various prepareInput... functions. What could help, I think, would be to use a more specific term for what is called "Input Data" here and in the docs, e.g. "Reservation Info" or something like that. Consequently, rename this function to e.g. prepareMacInputReservationInfo.


go/lib/colibri/colibri.go, line 399 at r11 (raw file):

	}

	copy(buffer[0:12], inf.ResIdSuffix)

For the segment reservations, this is only 4 bytes, right? Removing these 8 bytes of padding would seem to allow to bring this down to a single block for the static MAC (although these are probably not the ones that we need to optimize).

Copy link
Member Author

@juagargi juagargi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 33 of 200 files reviewed, 28 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


doc/protocols/scion-header.rst, line 806 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

I don't understand why we bother with the -1 here? Packets sent at time 0 of the reservation are valid, right? So why the special case? The range can cover the 16s either way.

This part came verbatim from epic HP, so I might be wrong with my answer: the numerator can go up to 16 seconds, and the denominator is always normalizing in 4 ns units, so the fraction would go as high as 2^32, which doesn't fit in 4 bytes. With -1 it fits.
Paging @mlegner here.


go/lib/colibri/colibri.go, line 40 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

Should use time.Duration type:

packetLifetime = 2 * time.Second
clockSkew  = time.Second

Done.


go/lib/colibri/colibri.go, line 57 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

This could be b := [8]byte{} instead (to avoid the heap allocation, hopefully).

Done.


go/lib/colibri/colibri.go, line 61 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

This seems very awkward; this value is now byte-reversed this packet timestamp in an uint64 but then is reversed again when it's actually put into the packet. Same for parsing the packet (byte reversed on parsing) and then again when parsing the timestamp Perhaps it would be clearer to use a separate type for this packet timestamp instead, type PacketTimestamp [8]byte.

Agree, it's driving me crazy. Although the refactor has taken me > 1 hour.


go/lib/colibri/colibri.go, line 82 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

This function should never be used on incoming packets because there is no requirement to create the timestamp with this default coreID/coreCounter split. I'd remove this function, and rename ParseColibriTimestampCustom to ParseColibriTimestamp.

Wait, I see in the header doc that this is how the colibri packet looks like.
Note that we recommend to use the CoreID and CoreCounter split, and I guess we will follow our recommendation 🤷‍♂️
I have slightly change these two functions though, following previous advice, PTAL.


go/lib/colibri/colibri.go, line 121 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

Use time.Time:

expiration := time.Unix(4 * expirationTick, 0)
earliest := expiration.Add(-16 * time.Second)
now := time.Now()
if now.After(expiration) { ... }
if now.Before(earliest) { ... }
diff := now.Sub(earliest)
return diff.Nanoseconds() / 4 // See comment in scion-header.rst -- why even bother with the -1.

Done.


go/lib/colibri/colibri.go, line 138 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

This should again use time.Time/Duration types.

now := time.Now()
timestamp := time.Unix(4 * expirationTick - 16), 0)
if now.Before(timestamp.Add(-clockSkew)) { ... }

Done.

Copy link
Member Author

@juagargi juagargi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 32 of 200 files reviewed, 28 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


proto/discovery/v1/discovery.proto, line 74 at r9 (raw file):

Previously, matzf (Matthias Frei) wrote…

Ok, I really do not agree but I see this is a more general topic. Either way, why still have the anycast address then?

There is (was) an anycast resolver in client.go that I forgot to remove. It's gone now.
In the border router the C=1 packets are forwarded to the colibri anycast address. This is a bug that has to be solved, but only prevents the packets from the control plane itself from using reservations. I.e., when a reservation has to be renewed, it should use the existing reservation. For now, that code is disabled (also in client.go).

Copy link
Member Author

@juagargi juagargi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 32 of 200 files reviewed, 28 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


go/lib/addr/host.go, line 289 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

The existing names are acronyms , COL is not.

Wait, I thought that SvcSB was for Sibra.
Anyway this looks to me like a discussion on a minor topic, and I would gladly think about changing it if I had more time.

Copy link
Member Author

@juagargi juagargi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 32 of 200 files reviewed, 28 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


go/lib/colibri/colibri.go, line 59 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

This bit confused me a lot. It seems to be correct, but overwriting byte 4 with the coreID and then overwriting byte 3 with the timestamp was really surprising.
Maybe this could be changed to packedId := (coreCounter & 0xff ff ff) | (coreID & 0xff << 24). Then we can even invoke the CreateColibriTimestampCustom and avoid the repeated implementation.

The function has changed a bit, PTAL again.

Copy link

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 32 of 200 files reviewed, 28 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


doc/protocols/scion-header.rst, line 806 at r10 (raw file):

Previously, juagargi (Juan A. Garcia Pardo) wrote…

This part came verbatim from epic HP, so I might be wrong with my answer: the numerator can go up to 16 seconds, and the denominator is always normalizing in 4 ns units, so the fraction would go as high as 2^32, which doesn't fit in 4 bytes. With -1 it fits.
Paging @mlegner here.

As mentioned below, 2^32 * 4ns ~= 17s, or put the other way around, 16s / 4ns == 4e9 < 2^32 ~= 4.29e9. Or still another way, about 7% of the representable range is invalid because it represents a timestamp larger than the expiration time.

The offset of 1 == 4ns does really not significantly affect the range. The same is true for EPIC HP, I don't understand why this was chosen there.

Btw, because you mention that this is copied from EPIC, i took a closer look at the packet ID. For EPIC, the packet ID fields are a slightly oversized (64-bits), as already ~48-bit would suffice to uniquely identify packets for the possible lifetime (24 hours) of a path even at truly ridiculous speeds of 1e9 packets/s (at least a factor 10 away from what can currently be achieved on a single machine). Note that at these speeds, SCION security starts to break down, as guessing attacks for the MACs start to become possible (48-bit HF MAC).
For Colibri, the max lifetime is much shorter, so it seems using a much shorter packet ID should do. Effectively, I'd suggest to drop the counter part entirely, and only use the 32-bit timestamp.

Copy link
Member Author

@juagargi juagargi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 32 of 200 files reviewed, 28 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


go/lib/colibri/colibri.go, line 180 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

subtle.ConstantTimeCompare(mac[:4], currHop.Mac)

Done.


go/lib/colibri/colibri.go, line 219 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

Programming error; no need to check, just let it panic in CryptBlocks.

Done.


go/lib/colibri/colibri.go, line 221 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

Avoid this allocation and the allocation for the input buffer (in particular for the VerifyMAC use in the router).; let the caller provide a buffer and encrypt in-place, similar to how this is done for the HF MAC (improvements on master). Same applies to the other CalculateColibriMac* and prepareMacInput* functions.

Let's postpone these performance optimizations for later. I can add a comment to the code with a TODO(juagargi) performance: and describe the issue on each of them, not to loose the review.


go/lib/colibri/colibri.go, line 223 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…
16

Many repetitions of this 16, maybe nicer to define const blockSize = 16 (or use aes.BlockSize).

Done.

Copy link

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 32 of 201 files reviewed, 24 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


go/lib/colibri/colibri.go, line 59 at r10 (raw file):

Previously, juagargi (Juan A. Garcia Pardo) wrote…

The function has changed a bit, PTAL again.

💯


go/lib/colibri/colibri.go, line 82 at r10 (raw file):

Note that we recommend to use the CoreID and CoreCounter split, and I guess we will follow our recommendation ��‍♂️

Ok but we're parsing someone elses packet here. Anyway, doesn't matter, nobody ever looks at the return value of this function anyway...


go/lib/colibri/colibri.go, line 70 at r12 (raw file):

	// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	ts := colibri.Timestamp{}
	binary.BigEndian.PutUint64(ts[:], (uint64(tsRel)<<32)|uint64(pktId))

That's a bit odd now, why not PutUint32 for each of the two components. Same result, but still...

Copy link

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 32 of 201 files reviewed, 17 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


go/lib/addr/host.go, line 289 at r10 (raw file):

Previously, juagargi (Juan A. Garcia Pardo) wrote…

Wait, I thought that SvcSB was for Sibra.
Anyway this looks to me like a discussion on a minor topic, and I would gladly think about changing it if I had more time.

Ok 🤷


go/lib/colibri/colibri.go, line 221 at r10 (raw file):

Previously, juagargi (Juan A. Garcia Pardo) wrote…

Let's postpone these performance optimizations for later. I can add a comment to the code with a TODO(juagargi) performance: and describe the issue on each of them, not to loose the review.

Ok


proto/discovery/v1/discovery.proto, line 74 at r9 (raw file):

In the border router the C=1 packets are forwarded to the colibri anycast address

I just checked; the router only uses the SvcCol address as map key to look up the service, but it forwards the packet to the corresponding underlay IP address. The service address is not used on "the wire" here.
But it looks like getting rid of using the service address here would require some refactoring or special casing, so it probably makes sense to keep it like this.

Related to this observation, I'm now a bit confused on how colibri packets with C=1 ever arrives at the Colibri service' socket in the intermediate hops. AFAIU, the packet is addressed to the destination AS, so dispatcher should not forward it to a normal socket?

Copy link

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 32 of 201 files reviewed, 18 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


go/lib/slayers/scion.go, line 183 at r14 (raw file):

	// Serialize path header.
	if c, ok := s.Path.(*colibri.ColibriPathMinimal); ok {

if opts.FixLengths && ...

Copy link

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 5 files at r1, 2 of 24 files at r3, 15 of 122 files at r5, 2 of 29 files at r6, 33 of 147 files at r7, 4 of 28 files at r9, 2 of 6 files at r10, 3 of 5 files at r11, 1 of 7 files at r12, 1 of 1 files at r14.
Reviewable status: 96 of 201 files reviewed, 24 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


go/cs/cs.go, line 591 at r14 (raw file):

		Public:   nc.Public,
		Intfs:    intfs,
		Router:   router,

Unused. Changes above can be reverted, too.


go/cs/cs.go, line 626 at r14 (raw file):

	})
	if err != nil {
		return serrors.WrapStr("starting periodic tasks", err)

Can we fix this upstream instead?


go/lib/pathpol/sequence.go, line 122 at r14 (raw file):

// EvalInterfaces is analogous to Eval, but accepts anything that has interfaces.
func (s *Sequence) EvalInterfaces(paths []snet.PathInterfacesHaver) []snet.PathInterfacesHaver {

This additonal PathInterfaceHaver seems quite clunky. Perhaps rather define this on [][]snet.PathInterface? Or, in fact, just export the eval to check a single path directly, because that's what we use.

Btw. why is eval not identical to the loop body in Eval above, couldn't the logic be shared? Could this addition be upstreamed separately to reduce the diff?


go/lib/snet/path.go, line 74 at r14 (raw file):

// The interface list returned typically has an even number, as it traverses
// N ASeswith 2 interfaces each.
type PathInterfacesHaver interface {

See comment in pathpol, this seems unnecessary.


go/pkg/cs/tasks.go, line 51 at r14 (raw file):

	Intfs           *ifstate.Interfaces
	OneHopConn      snet.PacketConn
	Router          snet.Router

Unused


go/pkg/sciond/colibri/grpc_client.go, line 30 at r14 (raw file):

type DaemonClient struct {
	Dialer coliquic.GRPCClientDialer

Nit. This is puzzling at first sight, as it appears as this would be using quic, but it isn't. The"coliquic" stuff isn't used as far as i can tell, so just define it as Dialer libgrpc.Dialer directly?


type RenewalSuccess func(*Reservation)

// RenewalError is a function that is called whenever there is an error during renewal.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest RenewalErrorHandler instead of RenewalError.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And same for RenewalSuccess

// Two use it, one must first create one through a call to NewReservation, and then
// manage its lifetime with a call to Open and Close. Close must always be called
// to free up the resources after the Reservation is no longer needed.
// Internally, the Reservation has a periodic Runner handling the renewal of the reservation.
type Reservation struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if this reservation/path were used to:

  • Contact a different destination machine in the same IA (in terms of snet.path, this path would be valid).
  • Contact the same destination machine but a different port.

AFAIK both these should not work because the reservation is bound to a specific IA+machine+port. A flexible approach on snet.Path would be a function canReachAddress(snet.UDPAddr), where each path can report for itself whether it can be used to reach a specific UDP address. Using this, an incompatible path can be detected before even attempting to send any data over the wire. The only current check for path-connection compatibility that I know of is here, where only the destination IA is checked.

Copy link
Member

@mlegner mlegner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 93 of 205 files reviewed, 26 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


doc/protocols/scion-header.rst, line 806 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

As mentioned below, 2^32 * 4ns ~= 17s, or put the other way around, 16s / 4ns == 4e9 < 2^32 ~= 4.29e9. Or still another way, about 7% of the representable range is invalid because it represents a timestamp larger than the expiration time.

The offset of 1 == 4ns does really not significantly affect the range. The same is true for EPIC HP, I don't understand why this was chosen there.

Btw, because you mention that this is copied from EPIC, i took a closer look at the packet ID. For EPIC, the packet ID fields are a slightly oversized (64-bits), as already ~48-bit would suffice to uniquely identify packets for the possible lifetime (24 hours) of a path even at truly ridiculous speeds of 1e9 packets/s (at least a factor 10 away from what can currently be achieved on a single machine). Note that at these speeds, SCION security starts to break down, as guessing attacks for the MACs start to become possible (48-bit HF MAC).
For Colibri, the max lifetime is much shorter, so it seems using a much shorter packet ID should do. Effectively, I'd suggest to drop the counter part entirely, and only use the 32-bit timestamp.

I agree with @matzf with respect to the -1. The original idea with the counter was that it would allow multiple cores to coordinate such that no duplicate packet IDs occur.

Copy link
Member Author

@juagargi juagargi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 79 of 206 files reviewed, 22 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


go/lib/colibri/colibri.go, line 248 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

Can we include the function CalculateColibriMacSigma in CalculateColibriMacPacket? E.g.. keep it two separate local helper functions, but the public function only needs to be available for the combined functionality (nothing is caching the intermediate result anyway).

Done.


go/lib/colibri/colibri.go, line 337 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

Shouldn't this rather be checked when parsing the SCION packet (I guess it already is). If this is just an assertion, panic instead of returning an error.

Done.


go/lib/colibri/colibri.go, line 350 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

nrBlocks := (bufLen + blockSize - 1) / blockSize, no float.

Done.


go/lib/colibri/colibri.go, line 399 at r11 (raw file):

Previously, matzf (Matthias Frei) wrote…

For the segment reservations, this is only 4 bytes, right? Removing these 8 bytes of padding would seem to allow to bring this down to a single block for the static MAC (although these are probably not the ones that we need to optimize).

Because of efficiency reasons in both the border router and the bandwidth monitors, there is no distinction in the size of the suffix.
For the static MAC the suffix is always going to look like 4 bytes with data + 8 more zeroes, so I see no harm into reducing the suffix space for this static MAC computation.
We will probably want a benchmark to assess the performance difference.
I have added a note in the code to work on this during the "performance iteration" for the BR.


go/lib/pathpol/sequence.go, line 122 at r14 (raw file):

Previously, matzf (Matthias Frei) wrote…

This additonal PathInterfaceHaver seems quite clunky. Perhaps rather define this on [][]snet.PathInterface? Or, in fact, just export the eval to check a single path directly, because that's what we use.

Btw. why is eval not identical to the loop body in Eval above, couldn't the logic be shared? Could this addition be upstreamed separately to reduce the diff?

Done.


go/lib/scrypto/mac.go, line 72 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

Remove the error return if this never errors. Previously, this had a check for the length of the key with a panic (assertion) -- maybe that can be added back instead of the error.

Done.


go/lib/slayers/scion.go, line 183 at r14 (raw file):

Previously, matzf (Matthias Frei) wrote…

if opts.FixLengths && ...

Done.


go/pkg/router/colibri_processing.go, line 189 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

Why all these checks? Programming error, just let it crash (i.e. just access the fields, or check and panic).

Done.


go/pkg/router/colibri_processing.go, line 270 at r10 (raw file):
The function destinedToLocalHost evaluates if the packet is destined to the local AS. I've changed the name to destinedToLocalAS.

All these checks should be redundant if the packet is valid. Explicit validation of this is missing, and I believe this is also not covered by cryptographic check. Now a packet that would fail one of these checks would appear to land in a weird edge case (attempted to be forwarded).

Did you intend the comment for another function maybe? If not, I don't understand, as all the checks done in destinedToLocalAS are novel, and necessary to determine if it needs to call forwardToLocalAS


go/pkg/router/dataplane_test.go, line 1310 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

I could be mistaken, but I think you don't need these To4() calls everywhere.

They seem necessary. I tried without them, and indeed the IP object from the expected is 16 bytes, while the computed is 4 bytes, and the require.Equal fails.


go/pkg/sciond/colibri/grpc_client.go, line 30 at r14 (raw file):

Previously, matzf (Matthias Frei) wrote…

Nit. This is puzzling at first sight, as it appears as this would be using quic, but it isn't. The"coliquic" stuff isn't used as far as i can tell, so just define it as Dialer libgrpc.Dialer directly?

Done.


go/pkg/cs/tasks.go, line 51 at r14 (raw file):

Previously, matzf (Matthias Frei) wrote…

Unused

Done.

Copy link
Member Author

@juagargi juagargi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defects document here:
https://docs.google.com/document/d/13DG2vEcZP1CsjTikASKb8TmQ0f-vBoX4XJEzhX85yXw/edit?usp=sharing

Reviewable status: 79 of 206 files reviewed, 21 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


doc/protocols/scion-header.rst, line 806 at r10 (raw file):

Previously, mlegner (Markus Legner) wrote…

I agree with @matzf with respect to the -1. The original idea with the counter was that it would allow multiple cores to coordinate such that no duplicate packet IDs occur.

I have added this to the defects document, "design" section.


go/cs/cs.go, line 591 at r14 (raw file):

Previously, matzf (Matthias Frei) wrote…

Unused. Changes above can be reverted, too.

Done.


go/cs/cs.go, line 626 at r14 (raw file):

Previously, matzf (Matthias Frei) wrote…

Can we fix this upstream instead?

scionproto#4107.
Fix also present in this branch. If diff sees a conflict, it'll be extremely easy to understand.


go/lib/colibri/colibri.go, line 70 at r12 (raw file):

Previously, matzf (Matthias Frei) wrote…

That's a bit odd now, why not PutUint32 for each of the two components. Same result, but still...

Done

Copy link
Member Author

@juagargi juagargi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 79 of 206 files reviewed, 21 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


go/lib/colibri/colibri.go, line 221 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

Ok

I needed to refactor the MAC functions (due to another review comment), and used buffers as arguments and stack variables, avoiding heap allocations. PTAL again.


go/lib/colibri/colibri.go, line 274 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

I'm confused here, why is there another way to compute the MAC? Isn't MACInput + StaticMAC the same as CalculateColibriMacStatic? What is the difference between MACInput and prepareMacInputStatic?

Done. It should be now clear that MACInput prepares the input, and that prepareMacStatic is just a stub that unwraps the info field and scion layer.


go/lib/colibri/colibri.go, line 390 at r11 (raw file):

Previously, matzf (Matthias Frei) wrote…

I got a bit lost here with the various prepareInput... functions. What could help, I think, would be to use a more specific term for what is called "Input Data" here and in the docs, e.g. "Reservation Info" or something like that. Consequently, rename this function to e.g. prepareMacInputReservationInfo.

The functions have changed name. They are now MACInput{Static,Sigma,E2E}.
The MAC derivation functions are called similarly, MAC{Static,Sigma,E2E}
The special function that computes the MAC from an input buffer is called MACStaticFromInput. It's used only when we don't have the slayers parts, like in the colibri store.


go/lib/colibri/client/reservation.go, line 40 at r15 (raw file):
Good questions.

What would happen if this reservation/path were used to:

  • Contact a different destination machine in the same IA (in terms of snet.path, this path would be valid).

According to our design, the source and destination addresses (typically IPv4 or v6) are part of the computation of the MAC for E2E reservations. At the moment the MAC computation is not fully finished, and we are using the static MAC also for the E2E reservations, which do not cover the src and dst addresses, but this will be implemented in the next iteration.

  • Contact the same destination machine but a different port.

This is interesting: at the moment (current implementation and design) nothing prevents an application from making a reservation and then contact a different port.


go/lib/colibri/client/reservation.go, line 59 at r15 (raw file):

Previously, JonasGessner wrote…

And same for RenewalSuccess

Done.


go/pkg/router/colibri_processing.go, line 66 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

I really like how straight forward and readable this validation and forwarding logic is!

What is missing is SCMP errors for invalid packets and for the internal/external interface down cases (should at least be mentioned in a TODO). Note that there have been some changes in master/scionlab related to this, so perhaps you want look at this after rebasing.

I didn't know this was missing!
Added a TODO comment in the code, plus an entry in the defects document. This is kind of important, but to move forward quickly we will fix it in the next iteration.


proto/discovery/v1/discovery.proto, line 74 at r9 (raw file):

Previously, matzf (Matthias Frei) wrote…

In the border router the C=1 packets are forwarded to the colibri anycast address

I just checked; the router only uses the SvcCol address as map key to look up the service, but it forwards the packet to the corresponding underlay IP address. The service address is not used on "the wire" here.
But it looks like getting rid of using the service address here would require some refactoring or special casing, so it probably makes sense to keep it like this.

Related to this observation, I'm now a bit confused on how colibri packets with C=1 ever arrives at the Colibri service' socket in the intermediate hops. AFAIU, the packet is addressed to the destination AS, so dispatcher should not forward it to a normal socket?

Indeed, one of the things that must be changed (and still pending on the list for this PR) is fixing the data plane for the control plane packets (C=1).
In the colibri design the packet is forwarded unaltered to the transit colibri service, but we might have to change the dispatcher or the control plane to adapt for the differences in the implementation.

Copy link
Member Author

@juagargi juagargi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 51 of 208 files reviewed, 21 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


go/pkg/router/colibri_processing.go, line 39 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

Unused (an the corresponding field in the "main" packet processor has been removed on master).

Done. I have just rebased on the last scionlab and found this.

@juagargi juagargi force-pushed the scionlab_colibri_feature branch 2 times, most recently from 0a04720 to 204c32b Compare August 30, 2021 13:29
@JonasGessner
Copy link

What would happen if this reservation/path were used to:
Contact the same destination machine but a different port.

This is interesting: at the moment (current implementation and design) nothing prevents an application from making a reservation and then contact a different port.

This sounds to me like it drastically decreases the security provided by the whole reservation accepting/denying system. If I can make a reservation to port X and I can use that reservation to send data to port Y, which would itself reject the reservation, then it almost defeats the entire purpose of allowing to deny reservations in the first place. If a reservation can be used for any port, then a machine can realistically only enforce either accepting all reservations or rejecting all reservations, regardless of port.

Contact a different destination machine in the same IA (in terms of snet.path, this path would be valid).

According to our design, the source and destination addresses (typically IPv4 or v6) are part of the computation of the MAC for E2E reservations. At the moment the MAC computation is not fully finished, and we are using the static MAC also for the E2E reservations, which do not cover the src and dst addresses, but this will be implemented in the next iteration.

Ok. Coming back to my suggestion of adding a method to paths that allows checking if the path can be used to contact a certain address: With such a method one can detect invalid paths before ever sending a packet. It's not anything particularly useful, but acts as a guard against programming mistakes at the very least.

@matzf
Copy link

matzf commented Sep 14, 2021

  What would happen if this reservation/path were used to:
   Contact the same destination machine but a different port.

This is interesting: at the moment (current implementation and design) nothing prevents an application from making a reservation and then contact a different port.

This sounds to me like it drastically decreases the security provided by the whole reservation accepting/denying system. If I can make a reservation to port X and I can use that reservation to send data to port Y, which would itself reject the reservation, then it almost defeats the entire purpose of allowing to deny reservations in the first place. If a reservation can be used for any port, then a machine can realistically only enforce either accepting all reservations or rejecting all reservations, regardless of port.

This makes sense though, Colibri is a mechanism in the network layer, ports are one layer higher -- making Colibri aware of ports would be clunky.
I think Jonas has a very good point here though, giving such a weak guarantee is not very helpful -- so, what is Colibri trying to provide by letting the destination end host approve the E2E reservation? The destination AS has already approved, so we can assume that the network capacities are available to handle this reservation. My interpretation is that this provides a sort of a flow control mechanism, i.e. a way such that the receiver can signal the rate at which it is able to process incoming traffic. While this can certainly be helpful in certain situations, I think this is the wrong place to do this. The transport layer protocol should be responsible for flow control -- Colibri does not provide guaranteed delivery either, so it's clear that some form of transport layer control will still be needed even when taking full advantage of Colibri.
If we'd remove this end host approval mechanism, how would Colibri change? Do the E2E reservations still need to be approved by all on-path ASes, or would it then be enough to do the admission in the source AS?

Copy link
Member Author

@juagargi juagargi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both concerns have been included as "COLIBRI for Applications (port numbers)" in the defects document, to review the design.
For now we keep the original design based on the paper and Dominik's implementation.

Reviewable status: 46 of 210 files reviewed, 21 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


go/lib/colibri/colibri.go, line 207 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

This MAC function should be reused (in particular for the VerifyMAC use in the router), like for the hop field MAC. Initializing an AES cipher allocates memory and does some work to expand the keys.

Added a TODO to fix it later.

@juagargi juagargi marked this pull request as ready for review October 5, 2021 14:39
Colibri.header.doc (#22)
Add 'Original Payload Length' to Info Field.
Correct flag consistency checks.
colibri border router (#24)
simplify parsing. (#25)
Colibri.experiments (#27)
Performance changes, fixes and tests added for the NSDI paper.
Per-packet MAC: replace payload size input with total packet size input. Static and sigma MAC InputData: Remove HFCount input. Fix warnings in Colibri doc. (#26)
gRPC with COLIBRI primer (#28)
Colibri initiator (#29)
gRPC for COLIBRI (#30)
Replace ColibriService.md with ColibriDesign.rst
Resolve failing /go/pkg/gateway/control/fake:go_default_test
Fix linter errors.
Colibri.topology entry (#31)
Colibri.e2e.basic api (#33)
Colibri.move to co (#34)
Colibri.fix data plane (#35)
fix e2e mac computation, simplify. (#37)
For now use the static MAC computation also for E2E.
Add tests.
Colibri.extended api (#38)
Colibri.refactor (#39)
changes from review.
changes from review.
changes from review.
missing build file in go/co
Colibri.extended api (#40)
Colibri.cleanup (#41)
Colibri.review (#43)
Colibri.review (#44)
use a auto-freshen cache for the colibri listings in sciond (#45)
several fixes.
Colibri.remove todos (#46)
Colibri.admission list (#47)
Colibri.remove tod os (#48)
Colibri.integration test (#49)
remove stray deleteme
fixes.
local topos with 1Gbps capacity for colibry purposes
remove last deleteme.
Copy link

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 20 files at r17, 21 of 53 files at r18, 5 of 16 files at r19, 2 of 7 files at r20, 3 of 39 files at r21, 33 of 73 files at r22.
Reviewable status: 106 of 228 files reviewed, 13 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


go/integration/colibri_integration/main.go, line 1 at r22 (raw file):

// Copyright 2021 ETH Zurich

This appears to be mostly shrunk down version of end2end_integration. Can we reuse the existing tooling by running end2end_integration -cmd ./bin/colibri instead?


go/pkg/router/colibri_processing.go, line 270 at r10 (raw file):

Previously, juagargi (Juan A. Garcia Pardo) wrote…

The function destinedToLocalHost evaluates if the packet is destined to the local AS. I've changed the name to destinedToLocalAS.

All these checks should be redundant if the packet is valid. Explicit validation of this is missing, and I believe this is also not covered by cryptographic check. Now a packet that would fail one of these checks would appear to land in a weird edge case (attempted to be forwarded).

Did you intend the comment for another function maybe? If not, I don't understand, as all the checks done in destinedToLocalAS are novel, and necessary to determine if it needs to call forwardToLocalAS

Replying to old thread, sorry I don't remember seeing this follow up question.

What i meant is that each one these three checks (dstIA == localIA, egressID == 0 and isLast) should be enough to determine that the packet is destined to the local AS. If the results of these checks are not consistent, it was a bad packet. There is however no explicit consistency check and so it seems possible that with with the current logic of and-ing these checks, the packet is considered not destined-to-local-AS if e.g. dstIA == localIA but egressID == 3, which could send it on the code path to be forwarded to an external interface.

Add a subcommand to the scion application that displays reservations.
The command reports segment reservations from src to dst, and
valid stitched full trips.
* use end2end_integration to run colibri.

Fix integration/colibri so that if called as client,
with src and dst being the same, it will just return success.
Add -progress flag (default is true) to end2end_integration.

* remove colibri_integration .

The end2end_integration test is able to run colibri with -cmd ./bin/colibri .
Copy link
Member Author

@juagargi juagargi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 104 of 235 files reviewed, 13 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


go/integration/colibri_integration/main.go, line 1 at r22 (raw file):

Previously, matzf (Matthias Frei) wrote…

This appears to be mostly shrunk down version of end2end_integration. Can we reuse the existing tooling by running end2end_integration -cmd ./bin/colibri instead?

Done.

An egress is 0 iif dst==local IA, and
current hop is 0 iif egress is 0 .
Fix wrong UTs.
Copy link
Member Author

@juagargi juagargi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 102 of 235 files reviewed, 13 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


go/pkg/router/colibri_processing.go, line 270 at r10 (raw file):

Previously, matzf (Matthias Frei) wrote…

Replying to old thread, sorry I don't remember seeing this follow up question.

What i meant is that each one these three checks (dstIA == localIA, egressID == 0 and isLast) should be enough to determine that the packet is destined to the local AS. If the results of these checks are not consistent, it was a bad packet. There is however no explicit consistency check and so it seems possible that with with the current logic of and-ing these checks, the packet is considered not destined-to-local-AS if e.g. dstIA == localIA but egressID == 3, which could send it on the code path to be forwarded to an external interface.

I understand now. I have extended the validation check, and used a simpler egress == 0 to flag a packet as destined to the local IA.

Copy link

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 73 files at r22, 2 of 12 files at r23, 1 of 2 files at r24.
Reviewable status: 106 of 235 files reviewed, 2 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained

Copy link

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 106 of 235 files reviewed, 3 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


go/lib/colibri/colibri.go, line 180 at r24 (raw file):

func MACInputStatic(buffer []byte, suffix []byte, expTick uint32,
	bwCls reservation.BWCls, rlc reservation.RLC, controlFlag, reverseFlag bool,
	idx reservation.IndexNumber, srcAS, dstAS addr.AS, ingress, egress uint16) {

Should this take full ISD-AS instead of only the AS identifier, i.e. srcIA, dstIA addr.IA (and the same for MACStatic)?

Copy link
Member Author

@juagargi juagargi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 106 of 235 files reviewed, 3 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


go/lib/colibri/colibri.go, line 180 at r24 (raw file):

Previously, matzf (Matthias Frei) wrote…

Should this take full ISD-AS instead of only the AS identifier, i.e. srcIA, dstIA addr.IA (and the same for MACStatic)?

The input for the MAC computation uses the reservation ID, which is composed of the reservation initiator AS ID plus a suffix. In the signature of the function we take both the source and destination AS IDs because depending on the reverse flag, the initiator is one or the other.

Copy link
Member Author

@juagargi juagargi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @JonasGessner from 2 discussions.
Reviewable status: 106 of 235 files reviewed, all discussions resolved / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained

@juagargi juagargi merged commit 61af520 into netsec-ethz:scionlab Oct 14, 2021
@juagargi juagargi deleted the scionlab_colibri_feature branch October 14, 2021 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants