Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interop with aioquic decryption issue #18

Open
The3ternum opened this issue Dec 15, 2020 · 12 comments
Open

Interop with aioquic decryption issue #18

The3ternum opened this issue Dec 15, 2020 · 12 comments

Comments

@The3ternum
Copy link

The3ternum commented Dec 15, 2020

This issue was mentioned previously on the pquic-channel in the quicdev slack

I also created an issue on the aioquic github page

I seem to have stumbled on a small decryption issue that seems to exist on the regular implementations of aioquic and pquic, and therefore also exists on other implementations based on aioquic and pquic. I will explain my observations across multiple tests here.

I have the following cases:
Case 1: A local aioquic client that tries to download 999999 bytes from a local PQUIC server
(Default aioquic implementation, the PQUIC server has enabled the QLOG plugin)

Case 2: An online interop result from interop.seemann.io between an aioquic client and a PQUIC server
(Test: transfer, flag: DC)

Case 3: My modified local aioquic client that tries to download 999999 bytes from a local pquic server
(Modified aioquic implementation that enables multipath, the PQUIC server has enabled the QLOG plugin, as well as the multipath_rr_cond and multipath_qlog plugins)

In all three cases both endpoints seem to have some issues with decrypting certain packets that are being transmitted while the handshake is being performed.

In Case 1 I generated a QLOG-file on both endpoints (Ac_large.qlog and Ps_large.qlog), and I also stored the console output of the PQUIC server (Ps_large.txt) All files are stored in logs.zip. By loading both QLOG files in qvis.edm.uhasselt.be and selecting the actual traces for both the client and the server (not the GENERATED ones, see the first screenshot), we get a visualisation of the issue, as seen in the second screenshot.

traces
trace decrypt error

After receiving an initial handshake packet from the PQUIC server, the aioquic client replies with an initial packet (pn: 1) and a handshake packet (pn: 2). The PQUIC server receives the first packet, fails to decrypt it, giving a "header or decryption error: 408." message and finally dropping the packet. The second handshake packet isn't mentioned in the console output. In response the PQUIC server replies with another handshake packet (pn: 1), and two 1RTT packets (pn: 0 and 1). The aioquic client receives these packets and can decrypt the handshake packet and the first 1RTT packet, but not the second 1RTT packet, logging a "packet_decrypt_error". The aioquic client thus acknowledges the handshake packet and the first 1RTT packet, but not the second 1RTT packet.
Later on, the PQUIC server identifies this second 1RTT packet as lost, as seen in the third screenshot. However, in this case the connection is able to continue, and the 999999 bytes are transferred, resulting in a clean connection termination.

trace packet lost

By inspecting the client's QLOG and the server's LOG from Case 2 I assume that the same problem occurs, because the client QLOG contains a "packet_decrypt_error", and the server log contains a "header or decryption error: 408.". The connection is also able to continue and terminate as in Case 1.

However, in Case 3 this problem also occurs, but when the packet is identified as lost by the pquic server, the connection is unable to continue. The PQUIC server starts sending 1RTT packets that each contain a ping and padding frame, which are all acknowledged by my aioquic client. These last two steps are repeated in a ping-ack loop as seen in the fourth screenshot, eventually leading to a very long timeout and a connection termination.

trace ping-ack loop

My questions:
-Is there an issue with the multipath_rr_cond plugin that prevents the server from sending stream data when a 1RTT packet is identified as lost? Resulting in the ping-ack loop?
-What is causing the decryption problem?

(Small edit note: when generating your own traces, PQUIC saves its packets in the QLOG file with {"packet_type": "onertt"}, whereas aioquic saves the packets with {"packet_type": "1rtt"}, search and replace the PQUIC entries to "1rtt" so that both QLOG files have the same notation, otherwise the visualisation will show almost all packets as "lost".)

@The3ternum
Copy link
Author

Additional observation: It seems that the undecryptable 1rtt packet that is being sent from the PQUIC server actually has a length that is greater than 1500 bytes (1523 bytes to be exact, excluding the header), thus the packet is rejected by the decryptor on the aioquic client.

@mpiraux
Copy link
Member

mpiraux commented Jan 5, 2021

I am working on a fix for this issue. Meanwhile a small note regarding the "onertt" vs "1rtt" packet type logging: it seems that it's a bug on aioquic's side, the spec allows "onertt" and "1RTT" as values for identifiying these packets, but not "1rtt". https://quiclog.github.io/internet-drafts/draft02/draft-marx-qlog-event-definitions-quic-h3.html#name-packettype

@rmarx
Copy link

rmarx commented Jan 7, 2021

Hey @mpiraux, small note on that: qlog actually only allows "1RTT". The onertt thing is because enums in most programming languages cannot start with a numerical character and we modeled the qlog schema after TypeScript, which has that same problem. Arguably that kind of implementation detail should not be in the spec and it has given others some confusion as well, so we'll probably remove it down the line, keeping only 1RTT as an entry.

In qvis specifically then, we're more lenient. It doesn't really look for the specific enum values, just matches identical packet_type string values together when trying to correlate two traces. So you can use "onertt" if you're just doing pquic traces, it just breaks when using aioquic's "1rtt" or another's "1RTT". Finally, we do a pass to transform to uppercase "1rtt" into "1RTT", originally to help with draft-00 to draft-01 transition, so aioquic seems interoperable with other qlogs, even though it is indeed also not 100% correct according to spec.

Hope that helps :)

@mpiraux
Copy link
Member

mpiraux commented Jan 7, 2021

I just pushed a big update for Multipath, which basically uses and converts initial paths into uniflows instead of opening new (redundant) uniflows. In our testing this has simplified things quite a bit and fixed some bugs. Yours could very be one of them, as well as the other ones. Could you try the latest commit ? I will update the live endpoint in a few minutes

@The3ternum
Copy link
Author

Hi, thank you for the fix, it seems that the three packets from the first screenshot (initial 1 and handshake 2 from the aioquic client and 1RTT 1 from the pquic server) still are not decrypted correctly by either endpoint.

  • Between an aioquic CLIENT and a pquic SERVER:

    1. This loss of the 1RTT packet no longer causes the pquic server to remain stuck in a PING-ACK-loop.
    2. In addition, the connection seems to finish successfully, transporting all 999999 bytes between the endpoints.
    3. In this configuration the aioquic client is able to establish a second sending uniflow via the path challenge and path response mechanism.
    4. The pquic server is able to respong with MP_ACK-frames, the aioquic client can parse them correctly.
    5. I had a bug where the aioquic client sends a UNIFLOWS-frame before its additional sending uniflow was validated, causing only a single entry to exist in the sending uniflows info section. (And I have been able to fix it, so thank you :) )
  • Between an aioquic SERVER and a pquic CLIENT:

    1. I cannot yet confirm if this connection ends succesfully, due to certain things happening.
    2. It seems that, when the aioquic server tries to perform a path validation to establish an additional sending uniflow, the pquic client sends back a PATH_RESPONSE-frame (along with an additional PATH_CHALLENGE-frame and 2 UNIFLOWS-frames). However, the packet that carries these frames never arrives on the aioquic server (there are no errors, the packet isn't even logged as received via a "datagrams_received"-event, which happens before the packet is processed). Therefore the aioquic server can never establish its additional sending uniflow (or respond to the PATH_CHALLENGE-frame from the pquic server). Additionally, the uniflows frames from the pquic client contain 2 entries for the receiving uniflows, which is correct, but also 2 entries for the sending uniflows which i assume is incorrect, since the pquic client's additional sending uniflow has not yet been established. See the screenshot below.
      disappearing_path_response
    3. It also seems that the pquic client never sends an ack-of-ack trigger (or ack-eliciting frames), therefore the aioquic server never responds with its own ACKs, which (about 70 sent packets later) results in the pquic client logging almost every 1RTT packet as a loss event. See the screenshot below.
      packet_lost_events
    4. There are some instances that the pquic client sends another PATH_CHALLENGE-frame along with a UNIFLOWS-frame, on which the aioquic server is able to respond. This also triggers an ACK from the aioquic server, where all the "lost" packets at that point are acknowledged. The pquic client then also logs a lot of in-flight events. See the screenshot below.
      ack_eliciting
    5. After the aioquic server has transported all the data and signaled the end with stream 0 FIN-flag enabled, the pquic client keeps logging a lot of 1RTT packet loss events. Then the pquic client starts sending PATH_CHALLENGE frames and PING+PADING-frames for a while (which also restart at packet number 1, I assume these come from the additional sending uniflow on the pquic client). These packets are also not received by the aioquic server (again, not even the reception of the packet is logged as a "datagrams_received"-event, which happens before the packet is processed). Eventually the pquic client sends a CONNECTION_CLOSE-frame, closing the connection. See the screenshot below.
      connection_end

I have also included the QLOG-files from both the aioquic server and the pquic client, as well ass the console log from the pquic-client. See Logs.zip

@The3ternum
Copy link
Author

One of my questions is what the behaviour of the additional sending uniflows is in the PQUIC endpoint (more specific, the client), as it seems like it is trying to send data over a new path?

I believe that it is also worth mentioning that I have included an address filter for the pquic endpoint. I am not sure if it is correct, but the code should currently only allow "127.0.0.1" to be used. (It seems to work, as the PQUIC endpoint no longer advertises additional addresses.)
If incorrect, what code should I be using instead? And how do i include the IPv6 variant "::1"? See Only_local.zip for the code that i used.

@mpiraux
Copy link
Member

mpiraux commented Jan 14, 2021

However, the packet that carries these frames never arrives on the aioquic server (there are no errors, the packet isn't even logged as received via a "datagrams_received"-event, which happens before the packet is processed).

It would be interesting to capture where this packet is going. I am guessing the address handling code is bad.

Therefore the aioquic server can never establish its additional sending uniflow (or respond to the PATH_CHALLENGE-frame from the pquic server).

Correct, but it could send again the PC.

Additionally, the uniflows frames from the pquic client contain 2 entries for the receiving uniflows, which is correct, but also 2 entries for the sending uniflows which i assume is incorrect, since the pquic client's additional sending uniflow has not yet been established.

IIRC, in our code, sending uniflows are activated as soon as you receive credit for it in the form of a CID for the particular uniflow. It looks like the specs define the active state as path usage over a validated address. I'll file a bug for that.

It also seems that the pquic client never sends an ack-of-ack trigger (or ack-eliciting frames), therefore the aioquic server never responds with its own ACKs

We currently request an ACK to prune ACK ranges if more than 10 blocks are used, but otherwise we rely on the server to issue ACKs from time to time. We could change that to bundle an PING every x ACK. It's unfortunate the server doesn't send one from time to time.

There are some instances that the pquic client sends another PATH_CHALLENGE-frame along with a UNIFLOWS-frame, on which the aioquic server is able to respond.

My guess is that is considers the uniflow as broken at some point because it doesn't receive any ACKs, so it marks it as not verified and opens a new uniflow. I think it's a new one because UNIFLOWS frame are only sent once per uniflow.

After the aioquic server has transported all the data and signaled the end with stream 0 FIN-flag enabled, the pquic client keeps logging a lot of 1RTT packet loss events. Then the pquic client starts sending PATH_CHALLENGE frames and PING+PADING-frames for a while (which also restart at packet number 1, I assume these come from the additional sending uniflow on the pquic client).

The sending uniflow is again considered as broken at some point I guess.

So to conclude from all your interesting findings:

  • I'll file a bug regarding sending uniflows, path validation and the active state.
  • I suggest that you look at the addresses to which the missing packets are sent to.
  • My understanding is that at some point, uniflows are marked as broken, though I can't find the code that would explain that. So I'll dig deeper on that side.

@mpiraux
Copy link
Member

mpiraux commented Jan 14, 2021

It would be interesting to capture where this packet is going. I am guessing the address handling code is bad.

I actually just fixed a bug regarding that particular code, which caused memory corruption. The commit is running through our test suite and I should ship it here in the afternoon.
EDIT: There it is 9b27331

@mpiraux
Copy link
Member

mpiraux commented Jan 14, 2021

And how do i include the IPv6 variant "::1"?

The code is correct for IPv4 localhost, you need to compare the in6_addr_t to one initialised to "::1", using memcmp for instance.

@The3ternum
Copy link
Author

It seems that the QLOG-file generated by the PQUIC client does not contain "packet_received" or "packet_sent" events, which prevent QVIS from generating a visualisation (according to the console log seen in the screenshot).
Screenshot from 2021-01-15 17-15-06

i have used the following parameters to start the client (./picoquicdemo):

-P plugins/multipath/multipath_rr_cond.plugin -P plugins/qlog/qlog.plugin -P plugins/multipath/multipath_qlog.plugin -P plugins/multipath/addresses_filters/filter_only_local.plugin -q qlog_output_client.qlog -G 999999 ::1 4433

There's still something that goes wrong, although i am unable to discern the problem. On the aioquic server I am currently using only sending uniflow 0 until sending uniflow 1 becomes available, then only sending uniflow 1 is used (for testing purposes)

I have included the QLOGs from the aioquic server and the PQUIC client, along with the console log from the pquic client, see
Logs.zip

@mpiraux
Copy link
Member

mpiraux commented Jan 25, 2021

Something broke in the QLOG plugin, I am investigating further.

@mpiraux
Copy link
Member

mpiraux commented Jan 25, 2021

@The3ternum, d2acec6 has a fix for the QLOG bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants