Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit serialization format decision #144

Closed
rmarx opened this issue Mar 31, 2021 · 10 comments
Closed

Revisit serialization format decision #144

rmarx opened this issue Mar 31, 2021 · 10 comments

Comments

@rmarx
Copy link
Contributor

rmarx commented Mar 31, 2021

In draft-02, we decided to keep using JSON as the main serialization format, with NDJSON as a streaming option. We did update the data definition language used in the draft to make it easier to define custom serializations into e.g., binary formats, if others want a more performant option.

It might be needed to revisit this decision and to still go for a binary format by default. CBOR has so far been named most as a potential option, as it's an IETF standard (as opposed to e.g., protocol buffers or flatbuffers and similar) and has proved itself for other protocol-related use cases as well.

The question remains: do we consciously limit ourselves to a select few serialization formats? and even if we don't, which "default" formats do we commit to in the texts?

Several people are of the opinion it's enough to stick with for example JSON as the main format in the qlog specification and to have people write converters from/to JSON for more performant options themselves. This because even with something like CBOR, not all applications will want to employ that directly, and converters would still be needed.

@rmarx
Copy link
Contributor Author

rmarx commented Apr 7, 2021

Another simple way of viewing this (and one of the main questions I feel we need to answer ASAP post-adoption), is what the main goal of qlog is:

  1. Define a logging format that makes it easy for protocol implementations to log data (efficiently, at scale)
  2. Define a logging format that makes it easy to create reusable tooling

(I think it was @nibanks who put it like this, but for the life of me, I can't find the source).

If 1), we probably need to go full binary and optimize for write speed/size. 2) is imo what we have now, with a relatively verbose JSON setup. If we want to go for both at the same time (which I'm not sure is possible, might lead to "worst of both worlds") we'd probably end up with something like CBOR.

In my personal experience, it's the tooling part that's difficult and something that most parties don't want to invest in themselves/don't have the necessary expertise for in the networking/protocol teams. Making tooling easier (e.g., even using things like jq or simple python) and more reusable seems like it should stay the main goal. While the current qvis toolsuite doesn't deal well with larger qlog files, that's mainly because it's web-based, not because JSON can't scale to hundreds of MBs or even Gigabytes in a native tool. Of course, that's just my opinion :)

@nibanks
Copy link
Member

nibanks commented Apr 7, 2021

I think we can create translation libraries/tools easily enough between formats. The question then becomes what do you want to do?

a) Optimize for tools - Standardize around a single format (JSON most likely?) that tools can easily use. QUIC implementations then have the choice to either write directly to that format, or have a custom format, and a custom post processing tool to convert to the tool format.

b) Optimize for implementations - Standardize around a binary format (and helper library(s)?) for QUIC implementations to efficiently write to. Tools can either read from that format or perhaps someone could write a helper library to convert to some more easily consumed format. While I like the idea of trying to optimize for implementations, I wonder if it's just going to open a bigger can of worms. A lot of folks will have strong opinions here, and likely already have some solution they use for other performant logging anyways.

The more I think about it, I kind of lean towards a). Implementations can (and will) do what they want. If JSON is optimal enough for them (it seems it is for FB?) then they can use it. If they want something more/different, they can implement a translation layer/tool (what MsQuic partially has). And best of all, it makes the standardization process simpler.

One last thought though, is that while web-based solutions are currently the reason the tools are slow, I do expect JSON parsing to be significantly slower than a binary format, especially at GB file sizes. It'd be interesting to see a perf comparison. The best way I could think is to add qlog file format support to WPA, and update MsQuic's qlog translation layer. Then grab a large trace from MsQuic (binary ETW) and convert it to qlog. Then separately open the binary version and the qlog version and measure how long either takes.

@rmarx
Copy link
Contributor Author

rmarx commented May 19, 2021

One other option to consider is the use of / overlap with the PCAP-NG format typically used for packet captures.

This is apparently being considered for adoption on opsawg and might be flexible enough to include the endpoint-specific data we want to add to the mix.

I still need time to analyze what PCAP-NG actually does, but initial discussion on this was on the mailing list at: https://mailarchive.ietf.org/arch/msg/qlog/2bSRgRdaRleLhTDFv_C4DYZ3zng/

One benefit of this would be that we can easily log raw (encrypted) packets along with endpoint data as in normal .pcaps (though I'm not sure how useful that is). A downside is that the format is (AFAIK) barely supported outside of tools like wireshark (e.g., no easy mature open source parsers available, though I could be completely wrong on that count).

@mcr
Copy link

mcr commented May 21, 2021

It might not be the encrypted packets you are logging, it might be the DNS requests (whether encrypted or not), or even the ICMP packet too big, or the ICMP port unreachables.

@LPardue
Copy link
Member

LPardue commented Jul 13, 2021

As in implementer, why would I modify my endpoint to take a packet capture when e.g. tcpdump can already do that?

@mcr
Copy link

mcr commented Jul 13, 2021

As in implementer, why would I modify my endpoint to take a packet capture when e.g. tcpdump can already do that?

Because tcpdump can't capture (a) the cleartext packets inside the QUIC/TLS, (b) your state transitions.
The reason you might want it all in pcap-ng format is so that that you can combine an external view of the packets (including DNS requests, ICMPs, and TCP activities) with your internal capture.
You'd do a zipper merge on the data so that you can see that your internal state transition followed a failed DNS request, or something like that.

@LPardue
Copy link
Member

LPardue commented Jul 13, 2021

In my experience, my QUIC client application fails to resolve a name, I don't even make a UDP socket or QUIC connection object. My client stderr log throws an error message, possibly reporting the error returned by my resolution syscall.

Combining with wire packet captures seems to have marginal value when those packets don't contain more information than is available to the client.

@LPardue
Copy link
Member

LPardue commented Aug 2, 2021

Discussed during IETF 111. The feeling in the room was to stick with JSON serialization as the canonical interop format for qlog. Use of JSON does not prevent other serialization formats but we can constrain our scope of work to focus on one in this set of deliverables.

@LPardue
Copy link
Member

LPardue commented Aug 2, 2021

For clarification, the present specifics of the document's JSON serialization definitions are a starting point for further development should the WG declare consensus on using JSON.

The discussion about streaming serialization (whether NDJSON or some other format) is separate, so I've created #172 .

@rmarx
Copy link
Contributor Author

rmarx commented Aug 18, 2021

Given the consensus, I am closing this issue. The main related subissues are tracked in #172 and #143 going forward.

@rmarx rmarx closed this as completed Aug 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants