Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore CID sequence numbers #1465

Merged
merged 6 commits into from Jun 26, 2018
Merged

Conversation

MikeBishop
Copy link
Contributor

@MikeBishop MikeBishop commented Jun 20, 2018

After trying to find a better way to manage CID changes, probing, etc. I concluded that what we had before and removed is a superior place to start than anything else I'm coming up with.

The result of this PR is that, while we no longer have packet number gaps (so #1280 is no longer an issue), the sequence of CIDs issued by each endpoint is once again ordered. On each path (local address), you know the sequence number of the CID you're currently using, and you can recognize the sequence number of the CID you're currently receiving (because you assigned it).

Fixes #1452 by requiring that each CID change on a path move to a CID with a greater sequence number; you move to the CID with the matching sequence number if it's usable, but if it's not, you move to something with a higher sequence number. That will force your peer to meet you on that sequence number instead, and either way you land in equilibrium on each path.

This improves, but doesn't totally fix, the core problem on #1464. Once you've moved to the CID with sequence number X, the peer knows you can't use CID X-1 on this path even if you've never used it; if you're on the last / next-to-last CID the peer issued, you need more, even if you have some old ones you could still use on a different path. However, if you have the bad luck to use your last CID for a probe which gets dropped and you're no longer able to send on that path, you can still get stuck if your peer is only giving you one CID at a time. This probably still requires a REQUEST_CONNECTION_ID frame.

Potentially fixes the side-issue discussed on #1464, too. Because the CID sequence must advance per path, you can forget old CIDs you've issued once every path has either moved past that sequence number or been abandoned. New paths must use a higher sequence number than ever used on any path.

However, there's a slight possibility that peers will have different concepts of abandoned (particularly for probing on backup paths that are never migrated to). We could mint a RETIRE_CONNECTION_ID frame to tell the receiver that connection IDs with a sequence before X are no longer recognized by the sender. Or it might be sufficient to send a probe packet back on the potentially-abandoned path with a newer sequence number; assuming this is received successfully, it will force the peer to update the CID used on that path as well (even if it doesn't send any packets back).

@MikeBishop MikeBishop added design An issue that affects the design of the protocol; resolution requires consensus. -transport labels Jun 20, 2018
Copy link
Member

@nibanks nibanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text here is really confusing and I don't understand what it supposed to be going on. Perhaps you can explain more in the comments and we can figure out some better text.

on only one local address, and each local address MUST advance to a connection
ID with a later sequence number each time the connection ID changes. Once a
connection ID has been used, connection IDs with an earlier sequence number MUST
NOT be used for packets with a greater packet number from that local address.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused as to how this works with probing a new path (but not moving to it). Assume I am currently on Path A using CID 1, then I probe Path B with CID 2. This text seems to state that I may not use CID 1 on Path A with any packet number greater than I used to probe Path B with CID 2. Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At that point, you may not use a CID < 1 on path A, and may not use a CID < 2 on path B. You also aren't allowed to use CID = 2 on path A, because it's been used on a different path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically:

  • For each path, the CID sequence number must advance, but the currently-in-use CID for each path is unrelated. You could keep probing forever with CID2 on Path B even while Path A has moved up to CID 39, so long as the path isn't abandoned.
  • Each CID can only be used on a single path. If you used CID2 on Path B, you have to skip CID2 on Path A.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, maybe instead of "Once a connection ID has been used, connection IDs with an earlier sequence number MUST NOT be used for packets with a greater packet number from that local address." something like "Once a new connection ID is used, earlier connection IDs MUST NOT be used for new packets sent from the new local address".


: A variable-length integer. This value starts at 0 and increases by 1 for each
connection ID that is provided by the server. The connection ID that is
assigned during the handshake is assumed to have a sequence of -1. That is,
Copy link
Member

@nibanks nibanks Jun 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a real reason we can't consider the handshake CID 0 and start from there? I'd rather not have to deal with negative numbers in my code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. This is just the existing text. It's actually a little worse than that, because the Server's Preferred Address gives the client a CID to use when probing, so there can actually be two CIDs prior to the first one issued by frame.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then maybe we define handshake as 0, the server's preferred address as 1 (even if not present) and then new CIDs start from there (i.e. 2). I don't see any reason to prevent gaps in the CID sequence number space either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative being -2, -1, and starting from 0. I think the logic there is the principle that we should try to avoid there being invalid values, and 0/1 would be invalid values still present in the frame. If the frame can carry a 0, then 0 should be the first value value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also creates an odd situation in that SPA prompts the client to probe, but the server won't have a corresponding CID at that sequence number, so it will have to use the first CID the client issued to it -- if the client issues one with/before the probe! -- which will force the client to advance again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we have some weird special case, and have handshake and SPA are both considered 0? I understand and agree that all values should be valid, but I just really don't like the idea of a negative number if we don't absolutely need it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be very many connection IDs during the handshake with this new Retry packet, so it's all special casing here. Start at 2^16-N if you feel compelled to track things this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we just eliminate this statement and let it be implementation-local? The new text I added to the CID section says that "The series of
connection IDs issued by an endpoint is ordered, with the final connection ID
selected during the handshake coming first."

We could also simplify this by removing the CID from SPA and simply saying that the server needs to send the client a CID it can use for the probe pretty promptly if it wants the client to do so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that would help with the suggestion in #1468. Insisting on a fixed starting point ensures that you can do things like matching.

Removing the connection ID from the preferred address would make it harder to get that right because you then need to wait for a NEW_CONNECTION_ID frame before migrating.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've merged this, so let's open an issue to close this out.

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a little tricky.

support this, multiple NEW_CONNECTION_ID messages are needed.
support this, multiple NEW_CONNECTION_ID messages are needed. Each
NEW_CONNECTION_ID is marked with a sequence number. Connection IDs MUST be used
on only one local address, and each local address MUST advance to a connection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps
... on only one local address. An endpoint MUST use a connection ID with a higher sequence number if it changes to using a different local address or network interface. Once an endpoint has used a connection ID for packets sent from a given local address, they MUST NOT use a connection ID with a lower sequence number with that address.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, create a new section on connection ID use along the lines I describe below and cite that instead.


: A variable-length integer. This value starts at 0 and increases by 1 for each
connection ID that is provided by the server. The connection ID that is
assigned during the handshake is assumed to have a sequence of -1. That is,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be very many connection IDs during the handshake with this new Retry packet, so it's all special casing here. Start at 2^16-N if you feel compelled to track things this way.

in sequence numbers; it simply advances to the next currently-available
connection ID it has received. Connection IDs with earlier sequence numbers
which arrive later MAY be retained for use on other local addresses or
discarded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMG wall of text :)

When this happens, it's probably cause to create a new section.

FWIW, I like the idea that once you use a connection ID on a given interface, you can't use a lower-numbered one. That could extend to responses. Once you see a connection ID on a given tuple, then you can't use a lower-numbered one. That leads to some simple rules:

When you use a new interface/address you MUST use of a new connection ID.
When you use a new connection ID, you MUST use one with a higher sequence number than you have ever used before.
You MUST NOT use a connection ID with a lower sequence number than you have either sent or received on that path.

Formulating the text around that is what we need.

@MikeBishop
Copy link
Contributor Author

Moved to a separate section. Doesn't actually change the line count any, but hopefully it's clearer.

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure about the organization here, but the text is definitely worth having. We definitely need to do some reorganizing of the document real soon now, so I'm not going to stress about where text goes.

sequence number than the highest sequence number of any connection ID ever sent
or received on that local address.

Implementations SHOULD ensure that peers have a connection ID with a matching
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this isn't MUST, then the above MUST NOT will just result in broken connections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the trade-off is that lost packets can cause a peer to appear to violate this temporarily, so it's not realistically enforceable by a peer. You can't "MUST ensure" something has arrived on the other side, only detect whether it has or not. And the connection doesn't actually break; the peer will have to advance to a higher-order CID, which will force the endpoint to move off of the one that doesn't have a match (assuming it still hasn't arrived).

Actually, what a cautious implementation could do is to bundle the NEW_CONNECTION_ID frame with the corresponding CID in every packet between when it changes to a new CID until something gets ACK'd.


An endpoint MUST NOT send packets with a connection ID which has a lower
sequence number than the highest sequence number of any connection ID ever sent
or received on that local address.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be hard to understand. It's precise, but I think that we might provide a little more help. Something like "This guarantees that when an endpoint migrates to a new path the packets sent on that path use new connection IDs in both directions."

I don't know how to say this, but it seems like it might be worth adding: "Note: A connection ID MUST NOT include an unprotected encoding of the associated sequence number. Endpoints need to be able to recover the sequence number associated with each connection ID they provide without relying on information available to the recipient of the connection ID. A connection ID that encodes an unencrypted sequence number could be used to correlate connection IDs across network paths."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I tried to put it mid-discussion, but it didn't flow, so right now it's tucked at the end. Editorial discretion welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CID ratcheting language is imprecise
3 participants