Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow index-based referencing for all event field names #8

Closed
rmarx opened this issue Apr 5, 2019 · 1 comment
Closed

Allow index-based referencing for all event field names #8

rmarx opened this issue Apr 5, 2019 · 1 comment

Comments

@rmarx
Copy link
Contributor

rmarx commented Apr 5, 2019

Currently, we have a concept that you can have a "groups_id" object in "common_fields".
If you then have a "group_id" in your "events_field", the value for that with each event is an index into the "groups_id" field, to prevent replication of complex fields.

We could make this more general, applicable to each field name.
E.g., if you know up-front which CATEGORY values you support, you could do something like:

{
    "common_fields": {
        "group_id": "127ecc830d98f9d54a42c4f0842aa87e181a",
        "ODCID": "127ecc830d98f9d54a42c4f0842aa87e181a",
        "CATEGORY": ["PACKET_RX", "DATA_FRAME_NEW"]
        "protocol_type":  "QUIC_HTTP3",
        "reference_time": "1553986553572"
    },
    "event_fields": [
        "delta_time",
        "CATEGORY",
        "EVENT_TYPE",
        "TRIGGER",
        "DATA"
    ],
    "events": [[
            2,
            "TRANSPORT",
            0,
            "LINE",
            [...]
        ],[
            7,
            "APPLICATION",
            1,
            "GET",
            [...]
        ],
        ...
    ]
}

We would then have a general rule:

If the field is present as a value of type array in "common_fields" AND the field-name is present in "event_fields", the value per-event MUST be treated as an index into the "common_fields" entry, rather than taken as a direct value.

("groups_id" would then also be renamed to "group_id" in "common_fields")

This would allow smaller file sizes (and less string writing overhead) for applications that have a static list of e.g., CATEGORY or EVENT_TYPE up front.
Downside 1 is that it complicates the tools reading the file (but only a bit imo).
Downside 2 is that is complicates humans reading the file (so it depends on the use case).
(either way, it's easy to go from 1 to the other with a simple script)


This concept could be extended to make it a fully self-describing format.
In other words, we could also describe the fields in the DATA for known events up-front and replace those entries with in-order arrays of the values instead of key-value object definitions.

Very high-level concept (probably needs proper description of the fields etc.):

"data_fields" : {
      "TRANSPORT+PACKET_SENT" : [
         "frame_type",
         "packet_number",
         "frames"
     ]
}
...
[ 57, "TRANSPORT", "PACKET_SENT", "TRIGGER", ["STREAM", 15, [...]]]

Taking this all to the extreme, you could have a fully self-describing format that lists all known events (and potentially some values, similar to QPack's static table) up-top and then each entry just uses indexes + potentially a few raw values. However, I'm personally not of the opinion this added complexity is worth it.

@rmarx rmarx added the design label Apr 5, 2019
@rmarx rmarx changed the title Extend the group_id idea to everything Allow index-based referencing for event field values Apr 8, 2019
@rmarx rmarx changed the title Allow index-based referencing for event field values Allow index-based referencing for all event field names Apr 8, 2019
@rmarx
Copy link
Contributor Author

rmarx commented Oct 14, 2019

In the interest of keeping things simple and because there are few situations where this has turned up in the meantime, I've decided for now to only allow this for "group_id". Further fields that benefit from this can be added per-instance.

Note that Chrome's netlog format does this, and it severely compromises the user's ability to understand and grep those files, despite them using .json as a substrate as well.

@rmarx rmarx closed this as completed Oct 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant