Allow index-based referencing for all event field names #8

rmarx · 2019-04-05T16:27:53Z

Currently, we have a concept that you can have a "groups_id" object in "common_fields".
If you then have a "group_id" in your "events_field", the value for that with each event is an index into the "groups_id" field, to prevent replication of complex fields.

We could make this more general, applicable to each field name.
E.g., if you know up-front which CATEGORY values you support, you could do something like:

{
    "common_fields": {
        "group_id": "127ecc830d98f9d54a42c4f0842aa87e181a",
        "ODCID": "127ecc830d98f9d54a42c4f0842aa87e181a",
        "CATEGORY": ["PACKET_RX", "DATA_FRAME_NEW"]
        "protocol_type":  "QUIC_HTTP3",
        "reference_time": "1553986553572"
    },
    "event_fields": [
        "delta_time",
        "CATEGORY",
        "EVENT_TYPE",
        "TRIGGER",
        "DATA"
    ],
    "events": [[
            2,
            "TRANSPORT",
            0,
            "LINE",
            [...]
        ],[
            7,
            "APPLICATION",
            1,
            "GET",
            [...]
        ],
        ...
    ]
}

We would then have a general rule:

If the field is present as a value of type array in "common_fields" AND the field-name is present in "event_fields", the value per-event MUST be treated as an index into the "common_fields" entry, rather than taken as a direct value.

("groups_id" would then also be renamed to "group_id" in "common_fields")

This would allow smaller file sizes (and less string writing overhead) for applications that have a static list of e.g., CATEGORY or EVENT_TYPE up front.
Downside 1 is that it complicates the tools reading the file (but only a bit imo).
Downside 2 is that is complicates humans reading the file (so it depends on the use case).
(either way, it's easy to go from 1 to the other with a simple script)

This concept could be extended to make it a fully self-describing format.
In other words, we could also describe the fields in the DATA for known events up-front and replace those entries with in-order arrays of the values instead of key-value object definitions.

Very high-level concept (probably needs proper description of the fields etc.):

"data_fields" : {
      "TRANSPORT+PACKET_SENT" : [
         "frame_type",
         "packet_number",
         "frames"
     ]
}
...
[ 57, "TRANSPORT", "PACKET_SENT", "TRIGGER", ["STREAM", 15, [...]]]

Taking this all to the extreme, you could have a fully self-describing format that lists all known events (and potentially some values, similar to QPack's static table) up-top and then each entry just uses indexes + potentially a few raw values. However, I'm personally not of the opinion this added complexity is worth it.

The text was updated successfully, but these errors were encountered:

rmarx · 2019-10-14T09:29:52Z

In the interest of keeping things simple and because there are few situations where this has turned up in the meantime, I've decided for now to only allow this for "group_id". Further fields that benefit from this can be added per-instance.

Note that Chrome's netlog format does this, and it severely compromises the user's ability to understand and grep those files, despite them using .json as a substrate as well.

rmarx added the design label Apr 5, 2019

rmarx changed the title ~~Extend the group_id idea to everything~~ Allow index-based referencing for event field values Apr 8, 2019

rmarx changed the title ~~Allow index-based referencing for event field values~~ Allow index-based referencing for all event field names Apr 8, 2019

rmarx added the high-level-schema label Apr 8, 2019

rmarx mentioned this issue Apr 19, 2019

Allow raw logging of packets and/or frames #11

Closed

rmarx closed this as completed Oct 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow index-based referencing for all event field names #8

Allow index-based referencing for all event field names #8

rmarx commented Apr 5, 2019 •

edited

Loading

rmarx commented Oct 14, 2019

Allow index-based referencing for all event field names #8

Allow index-based referencing for all event field names #8

Comments

rmarx commented Apr 5, 2019 • edited Loading

rmarx commented Oct 14, 2019

rmarx commented Apr 5, 2019 •

edited

Loading