Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Implement a handshake for VSock #1443

Closed
wants to merge 2 commits into from

Conversation

teawater
Copy link

@teawater teawater commented Dec 3, 2019

This RFC implements a WebSocket-like upgrading for host-initiated vsock connection

Background

Currently the host-initiated vsock connection is initiated with a straight-fowward CONNECT <port_num>\n; command:

2. Guest: create an AF_VSOCK socket and listen() on <port_num>;
3. Host: connect() to AF_UNIX at uds_path.
4. Host: send() "CONNECT <port_num>\n".
5. Guest: accept() the new connection.

This works like a charm if the sequence is correct. However, sometimes we don't know whether the guest listener has been deployed when we send the command in STEP 4. As a result, the connection connected in step 3 will be closed silently.

The issue we facing

In practice, we don't know the exact time we have set up the VSock connection successfully because there is no message talking about that. We have to write a dailer to check the message feedback from guest.

eot := make([]byte, 1)
if _, _, err = unix.Recvfrom(int(file.Fd()), eot, syscall.MSG_PEEK); err != nil {
	conn.Close()
	return nil, err
}

This dailer expects the server in guest send message out once it accepts the connection. However, for some servers like ttRPC, they won't send anything to the client. This makes trouble to the client dailer, which has to

  • wait a period and assume it is connected if connection has not been closed; or
  • wait a period before the CONNECT and assume the server has been deployed.

Why not use guest-initiated vsock

We prefer host-initiated link because

  • For security consideration, we don't allow guest initiate a vsock connection;
  • Host-initiated vsock connection are used in other cases (qemu+grpc, qemu+ttrpc, fc+grpc).

The proposed handshake

What we proposed is a simple handshake interaction like WebSocket:

  • A new command called Upgrade <port_num>\n on step 4;
  • 4.5a hybrid vsock response 101\n (Switching Protocol) if connected
  • 4.5b hybrid vsock response 504\n (Service Unavailable) if guest has not listened on the port.
+---------------+                    +----------------+
|               |                    |                |
|               |  Upgrade 3         |                |
|               +------------------->+                |    no server available
|               |                    |                +-------------->
|               |  503               |                |
|               +<-------------------+                |
|               |                    |                |
|               |                    |                |                     +-------------+
|               |   (wait and retry) |                |                     |             |
|               |                    |                |                     |             |
|               |                    |                |                     | listen      |
|               |  Upgrade 3         |                |                     | here        |
|               +------------------->+                |    conn accepted    |             |
|               |                    |                +-------------------->+             |
|               |  101               |                |                     |             |
|               +<-------------------+                |                     |             |
|               |                    |                |                     |             |
|               |   (success)        |                |                     |             |
+---------------+                    +----------------+                     +-------------+
   client in                           hybrid vsock                           server in
     host                                                                       guest

Then the dailer may retry/fail based on the explicit reponse.

@sandreim
Copy link
Contributor

sandreim commented Dec 3, 2019

The CI failed on coverage -0.1% and clippy test, you will need to increase it back.
Also please run 'cargo clippy --all --profile test -- -D warnings' and fix the issues.

@sandreim sandreim added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Dec 3, 2019
@sipsma
Copy link

sipsma commented Dec 3, 2019

(Not a Firecracker maintainer, but work on firecracker-containerd)

Firecracker-containerd ran into the same essential issue this RFC is addressing as we also use host-initiated connections to guest listeners, specifically to connect to TTRPC services running in the guest.

We were able to workaround it by just having our guest listener always write an ack message back before it passed off an accepted connection to be used by the TTRPC server. That solution works for us given our particular use case only requires supporting Firecracker and we fully own the guest-side listener code (I'm guessing Kata has more requirements/complications than that), but it's not necessarily ideal and I appreciate the potential simplifications this RFC could introduce either way.

My only comment is that I'm wondering if the requirement to write CONNECT <port> followed by a separate UPGRADE <port> could potentially be simplified slightly. Would it be possible to instead introduce a new verb like HANDSHAKE <port> that is used in place of CONNECT rather than after it? So, if HANDSHAKE <port> is written, then the connection initiator should expect to read one of the status codes you've specified in the RFC (whereas CONNECT will retain the same behavior it has today). If that does work, it would reduce the number of writes the connector needs to make, which while minor may be a simplification worth considering. However, if there's some implementation detail that makes this more complicated than I'm expecting it's obviously not the end of the world to have to write both CONNECT and UPGRADE.

  • EDIT: On a re-read of the proposal, I may have misunderstood as I think you actually are suggesting UPGRADE being used in place of CONNECT rather than being sent after it, so nevermind about this comment :-)

The overall idea here LGTM either way.

@teawater
Copy link
Author

teawater commented Dec 5, 2019

var data = {files:[
], merged_files:[{"link":"kcov-merged/index.html","title":"[merged]","summary_name":"[merged]","covered_class":"lineCov","covered":"87.8","covered_lines":"19853","uncovered_lines":"2748","total_lines" : "22601"},
]};
var percent_low = 25;var percent_high = 75;
var header = { "command" : "", "date" : "2019-12-05 17:17:42", "instrumented" : 22601, "covered" : 19853,};

@sandreim I had got 87.8% in my part but just got 85.1% with default test.
Could you give me some advices about it?

@andreeaflorescu
Copy link
Member

@teawater sorry for the late reply. We are currently in a business trip. What command do you use for getting the coverage? That seems like a big difference between our CI and your local machine. Usually we get a 0.1% difference if we run on different machines because for some weird reason kcov behaves differently on different platforms.

I would suggest to take a look at the lines you introduced and check if they're all covered. If they're all covered, feel free to lower the coverage here:

This commit add a simple handshake interaction like WebSocket:
A new command called Upgrade <port_num>\n to request connect port.
hybrid vsock response 101\n (Switching Protocol) if connected.
hybrid vsock response 504\n (Service Unavailable) if guest has not
listened on the port.

Signed-off-by: Hui Zhu <teawater@antfin.com>
Update test_peer_request and test_local_request in connection.rs.
Add test_local_request_need_reply and
test_local_request_timeout_need_reply to connection.rs.
Add test_local_connection_with_upgrade,
test_local_close_with_upgrade and
test_local_connection_with_upgrade_get_oprst to muxer.rs.

Signed-off-by: Hui Zhu <teawater@antfin.com>
@dhrgit
Copy link
Contributor

dhrgit commented Dec 11, 2019

@teawater Thanks for contributing and for the comprehensive description of the proposed solution!

I agree that including a handshake does make for a more robust protocol here. In fact, at some point during development, we did have a simple handshake included, but that didn't make the final cut because the code was a bit messy and I underestimated its usefulness.

Going forward, I think we should consider two key points.

Separation of concerns

The vsock connection state machine (devices::virtio::vsock::csm) should work the same, regardless of the vsock backend used. Meaning that, even though we currently implement only one backend option (i.e. via Unix domain sockets), switching backends or adding new ones shouldn't require CSM changes. The initial handshake had this problem, which is why it didn't make the final cut. This holds for this PR as well, since the code responsible for sending the ack back to the host end lives in vsock::csm::Connection. My suggestion is to keep the backend-specific code inside the backend implementation (vsock::unix::* in this case). I've pushed a demo implementation for this to a branch in my fork. It implements the same upgrade protocol as this PR, but does so at the backend level, as opposed to the connection state machine level.

Backwards compatibility

The above solution gets a bit messy, because it has to support both the old CONNECT and the new UPGRADE commands. However, if we were to break backwards compatibility and just go with a simple always-enabled handshake, the solution would be much cleaner. I.e. every CONNECT <port> would either get an OK <assigned host-side port> response, or a connection reset. This was the original handshake, and I've pushed the code for it to another branch in my fork.

I'm kinda inclined towards the clean-code-no-compat approach, so I'd like more opinions on the compatibility issue. Paging: @sboeuf , @sipsma , @firecracker-microvm/compute-capsule - I'd appreciate your input.

@gnawux
Copy link

gnawux commented Dec 12, 2019

On the backward-compatibility, we (@teawater and I) agree that a single CONNECT command could be cleaner if the maintainers don't think the incompatible change matters.

@sboeuf
Copy link
Contributor

sboeuf commented Dec 13, 2019

@dhrgit hi! Sorry I couldn't review the discussion so far, I got busy...
But just wanted to let you know I'll make some time on Monday to review it.

@teawater
Copy link
Author

@dhrgit Will you post a new PR for current issue?

@sboeuf
Copy link
Contributor

sboeuf commented Dec 16, 2019

@teawater
Thank you for raising this issue and proposing such a clear solution. I'm fully supporting this initiative as it would make the consumer's life easier (no need for firecracker-containerd or Kata Containers to come up with their own way to deal with this problem when it's acknowledged this is a common issue).

@dhrgit

The vsock connection state machine (devices::virtio::vsock::csm) should work the same, regardless of the vsock backend used.

Yes I agree it is cleaner if we push the implementation down to each backend.

However, if we were to break backwards compatibility and just go with a simple always-enabled handshake, the solution would be much cleaner.

Oh yes please do that. I think this new vsock solution that you created is recent enough to be changed and introduce some breaking changes. It's better if we don't try to be backward compatible on such a new solution. Let's say the current solution is 0.9 and with the handshake, it would be 1.0 :)

@dhrgit dhrgit mentioned this pull request Dec 17, 2019
8 tasks
@dhrgit
Copy link
Contributor

dhrgit commented Dec 18, 2019

Closing since the issue was addressed by #1472.

@dhrgit dhrgit closed this Dec 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants