Skip to content

Interleave sending and receiving of packets to reduce rx latency#515

Merged
kaczmarczyck merged 26 commits intogoogle:developfrom
liamjm:send_or_recv
Aug 4, 2022
Merged

Interleave sending and receiving of packets to reduce rx latency#515
kaczmarczyck merged 26 commits intogoogle:developfrom
liamjm:send_or_recv

Conversation

@liamjm
Copy link
Copy Markdown
Contributor

@liamjm liamjm commented Jul 19, 2022

As discussed in #502 and #514 packet processing latency is too slow when both interfaces are being used. Another cause for this is that when a reply is being sent, it does so in a tight loop (ie inner loop in the main function) without receiving any other packets. This can easily introduce hundreds of milliseconds delay.

This PR removes the innner loop in main and leaves just a single loop. Each iteration it will optionally send a packet and always attempt to receive an incoming packet (#514 ensures suitable routing of the oldest packet). For our use case, the processing of these packets will be interleaved across the FIDO and Vendor HID interfaces (if both are present).

Several changes are required for this:

  1. SendOrRecvStatus enum now includes the UsbEndpoint in the Received variant.
  2. Code that uses SendOrRecvStatus handles the UsbEndpoint.
  3. The HidConnection::send_or_recv_with_timeout now allows any endpoint to be received, not just the one it sent one.
  4. HidPacketIterator now supports peeking to see if there is any data.
  5. main.rs has new structs to support the state tracking and large changes to the loop.
  6. tock CtapUsbSyscallDriver now sends packet before receiving incoming packet. Without this the tx packet is lost.

src/ctap/mod.rs Outdated
debug_ctap!(env, "Sent KEEPALIVE packet");
}
Ok(SendOrRecvStatus::Received) => {
Ok(SendOrRecvStatus::Received(_)) => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we receive on a different endpoint than the one being kept alive? Does it mean we discard it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find.

At present we discard it. I've added a TODO to address this (for a later PR), but I can address it here is you'd prefer.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's not really a regression (but I might be wrong). So I would say it's ok to do it later.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a regression if we receive, but the channel ID is by chance the same. Then a different transport would interact with this channel.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point.
I can add a check for this.

However, I am wondering about a bigger change. If the code ignores packets on the other interface, then those packets are lost. So perhaps the better approach is to send() but not send_and_recv() at this time. That way the incoming packets are not serviced and will sit there. Looking at the code, it looks challenging to connect incoming packets to the other interface without massive refactoring.
Thoughts?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally agree that a send / receive interface would be cleaner. The current solution is there for practical reasons. USB Request Blocks (URB) that need to be answered quickly. And if you only send and have the syscall roundtrip delay from that, you can't answer OUT packets in time.

If you try to go for a send / receive interface, please test for everything that didn't work last time.
@gendx @jmichelp are the original authors who made that decision. Maybe they can help describe a few test cases to look out for?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

What I've done in the short term is ignore those packet on the other interface. This stops the regression and leaves room for the above discussed improvements at a later date.
I've add tests to ./tools/vendor_hid_tests.py to verify this behavior.

ia0
ia0 previously approved these changes Jul 22, 2022
Copy link
Copy Markdown
Member

@ia0 ia0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on my side. I'll be OOO until Aug 3rd so don't wait for me if there are me updates.

src/ctap/mod.rs Outdated
debug_ctap!(env, "Sent KEEPALIVE packet");
}
Ok(SendOrRecvStatus::Received) => {
Ok(SendOrRecvStatus::Received(_)) => {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a regression if we receive, but the channel ID is by chance the same. Then a different transport would interact with this channel.

@coveralls
Copy link
Copy Markdown

coveralls commented Jul 25, 2022

Coverage Status

Coverage decreased (-1.3%) to 88.573% when pulling 31cc379 on liamjm:send_or_recv into 6276904 on google:develop.

Copy link
Copy Markdown
Collaborator

@kaczmarczyck kaczmarczyck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 high level comments:

  1. The new main is different enough (i.e. might discard replies) that we should test OpenSK a bit with it.

  2. The test script entangles raw HID and libfido. I don't think there is an elegant solution for this, but I thought it's interesting to mention that there is a way to make libfido talk to the second usage page. You monkey patch it's usage page:

import fido2
fido2.hid.base.FIDO_USAGE_PAGE = 0xFF00

You have to be careful to do that after GetInfo though, or we patch OpenSK to allow GetInfo on the second usage page. Not saying this approach is better or you should do it, it's just a different tradeoff I think.

@liamjm
Copy link
Copy Markdown
Contributor Author

liamjm commented Aug 4, 2022

2 high level comments:

  1. The new main is different enough (i.e. might discard replies) that we should test OpenSK a bit with it.

Testing with https://github.com/google/CTAP2-test-tool shows no differences.

  1. The test script entangles raw HID and libfido. I don't think there is an elegant solution for this, but I thought it's interesting to mention that there is a way to make libfido talk to the second usage page. You monkey patch it's usage page:
import fido2
fido2.hid.base.FIDO_USAGE_PAGE = 0xFF00

You have to be careful to do that after GetInfo though, or we patch OpenSK to allow GetInfo on the second usage page. Not saying this approach is better or you should do it, it's just a different tradeoff I think.

Good approach. I've used this idea.
The value is patched only for the period of enumerating the devices for the Vendor HID device, so should be OK.

@kaczmarczyck
Copy link
Copy Markdown
Collaborator

If everything works, then I'm happy with it!
@jmichelp can always complain later if something is wrong that I didn't account for :)
I'll merge #523 first.

@kaczmarczyck kaczmarczyck merged commit 4a2217f into google:develop Aug 4, 2022
@liamjm liamjm deleted the send_or_recv branch August 4, 2022 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants