Set and distribution metrics are currently being sent as float arrays in JSON which causes a significant overhead in various parts of the system.
The float arrays are expensive to parse and consumers which are not interested in the individual values still need to parse each float (without a specialised JSON parser). Additionally in practice the float arrays are a very inefficient encoding and require much more than 8 bytes (f64) per value.
All array values are converted to little endian before compression and encoding.
To increase overall throughput we want to use a more efficient encoding of the float arrays for set and distribution metrics.
Before switching from JSON to a binary format like CBOR we want to explore different ways to encode and compress the data with JSON first.
Benchmarks regarding throughput and compression have been collected in getsentry/bucket-compression.
We identified zstd to be a very good general purpose algorithm which should give us good wins across the board. To encode binary data (zstd compressed data) we will use Base64 (without padding).
Note: zstd is used in its streaming mode.
Proposed schema change for distribution and set values (values field):
{
"oneOf": [
{ "type": "array", "items": { "type": "number" } },
{
"type": "object",
"properties": {
"format": { "const": "array" },
"data": { "type": "array", "items": { "type": "number" } }
}
},
{
"type": "object",
"properties": {
"format": { "const": "zstd" },
"data": { "type": "string" }
}
}
]
}
Examples:
{
"org_id": 1,
"project_id": "12345...",
"timestamp": 1615889440,
"width": 10,
"name": "endpoint.response_time",
"tags": {
"route": "user_index"
},
"type": "d",
"value": {"format": "zstd", "data": "<base64>"}
}
{
"name": "endpoint.response_time",
...
"type": "s",
"value": {"format": "array", "data": [13.37, 42, 3.14159265358979323846264338327950288]},
}
Note: We want to stay compatible with the current JSON float array values, since we components which directly write to Kafka and are not using Relay as their ingestion path.
Milestone 1 - New JSON Schema
As a first milestone we want to support the new JSON schema with the array format. This does not change the encoding of the values yet.
We will have to add support for the new format in multiple systems:
### Relay
- [ ] https://github.com/getsentry/relay/pull/3137
- [ ] https://github.com/getsentry/sentry/pull/65410
### Consumers
- [x] Rework Schema Validation
- [x] Increase observability with additional metrics
- [x] Check Last-Seen-Updater
- [ ] https://github.com/getsentry/sentry-kafka-schemas/pull/222
### Snuba
- [ ] https://github.com/getsentry/snuba/pull/5560
- [ ] https://github.com/getsentry/snuba/pull/5617
- [x] Tweak DLQ Configuration
- [x] Additional Observability. Messages in the new format / time to decode a message
Milestone 2 - base64 and zstd
Add support for the base64 and zstd encodings.
### Relay
- [ ] https://github.com/getsentry/relay/pull/3218
- [ ] https://github.com/getsentry/sentry/pull/66588
- [ ] https://github.com/getsentry/relay/pull/3252
### Snuba
- [x] Add support for the `zstd` format
- [ ] https://github.com/getsentry/snuba/pull/5761
Milestone X - Future
### Future
- [ ] Update other producers which do not ingest via Relay
- [ ] Deprecate old format and/or remove support for the old format
- [ ] Add support for additional compressions (lossy?)
- [ ] Switch to a binary protocol like CBOR (instead of JSON)
- [ ] Relay: Use optimized JSON format for the metrics bulk endpoint
- [ ] Relay: Switch between different compressions based on the bucket contents (e.g. small buckets vs. big buckets etc.)
Test
Base64
Base64 - Distribution
Expected distribution values: [3, 1, 2].
{
"name":"d:transactions/foo@none",
"org_id":0,
"project_id":42,
"retention_days":90,
"tags":{},
"timestamp":1712219392,
"type":"d",
"value":{
"data":"AAAAAAAACEAAAAAAAADwPwAAAAAAAABA",
"format":"base64"
}
}
Base64 - Set
Expected set values: {1, 7}.
{
"name":"s:transactions/bar@none",
"org_id":0,
"project_id":42,
"retention_days":90,
"tags":{},
"timestamp":1712219392,
"type":"s",
"value":{
"data":"AQAAAAcAAAA=",
"format":"base64"
}
}
Zstd
Zstd - Distribution
Expected distribution values: [1, 2, 3].
{
"name":"d:transactions/foo@none",
"org_id":0,
"project_id":42,
"retention_days":90,
"tags":{},
"timestamp":1712219148,
"type":"d",
"value":{
"data":"KLUv/QBYrQAAcAAA8D8AQAAAAAAAAAhAAgBgRgCw",
"format":"zstd"
}
}
Zstd - Set
Expected set values: {1, 7}.
{
"name":"s:transactions/bar@none",
"org_id":0,
"project_id":42,
"retention_days":90,
"tags":{},
"timestamp":1712219148,
"type":"s",
"value":{
"data":"KLUv/QBYQQAAAQAAAAcAAAA=",
"format":"zstd"
}
}
Rollout Plan (for each Milestone)
- Rollout each service independently with support for the old and new JSON encoding
- S4S: Enable new format by letting Relay send the updated JSON, staggered for each namespace independently (start with
custom, end with transactions).
- SaaS: Enable new format, start with
custom end with transactions (see Step 2).
Rollback
The option to stop Relay from sending the new format can be immediately rolled back, Relay will stop sending the new format within 10 seconds.
Set and distribution metrics are currently being sent as float arrays in JSON which causes a significant overhead in various parts of the system.
The float arrays are expensive to parse and consumers which are not interested in the individual values still need to parse each float (without a specialised JSON parser). Additionally in practice the float arrays are a very inefficient encoding and require much more than 8 bytes (f64) per value.
All array values are converted to little endian before compression and encoding.
To increase overall throughput we want to use a more efficient encoding of the float arrays for set and distribution metrics.
Before switching from JSON to a binary format like CBOR we want to explore different ways to encode and compress the data with JSON first.
Benchmarks regarding throughput and compression have been collected in getsentry/bucket-compression.
We identified
zstdto be a very good general purpose algorithm which should give us good wins across the board. To encode binary data (zstdcompressed data) we will use Base64 (without padding).Note: zstd is used in its streaming mode.
Proposed schema change for distribution and set values (
valuesfield):{ "oneOf": [ { "type": "array", "items": { "type": "number" } }, { "type": "object", "properties": { "format": { "const": "array" }, "data": { "type": "array", "items": { "type": "number" } } } }, { "type": "object", "properties": { "format": { "const": "zstd" }, "data": { "type": "string" } } } ] }Examples:
{ "org_id": 1, "project_id": "12345...", "timestamp": 1615889440, "width": 10, "name": "endpoint.response_time", "tags": { "route": "user_index" }, "type": "d", "value": {"format": "zstd", "data": "<base64>"} }{ "name": "endpoint.response_time", ... "type": "s", "value": {"format": "array", "data": [13.37, 42, 3.14159265358979323846264338327950288]}, }Note: We want to stay compatible with the current JSON float array values, since we components which directly write to Kafka and are not using Relay as their ingestion path.
Milestone 1 - New JSON Schema
As a first milestone we want to support the new JSON schema with the
arrayformat. This does not change the encoding of the values yet.We will have to add support for the new format in multiple systems:
Milestone 2 -
base64andzstdAdd support for the
base64andzstdencodings.Milestone X - Future
Test
Base64
Base64 - Distribution
Expected distribution values:
[3, 1, 2].{ "name":"d:transactions/foo@none", "org_id":0, "project_id":42, "retention_days":90, "tags":{}, "timestamp":1712219392, "type":"d", "value":{ "data":"AAAAAAAACEAAAAAAAADwPwAAAAAAAABA", "format":"base64" } }Base64 - Set
Expected set values:
{1, 7}.{ "name":"s:transactions/bar@none", "org_id":0, "project_id":42, "retention_days":90, "tags":{}, "timestamp":1712219392, "type":"s", "value":{ "data":"AQAAAAcAAAA=", "format":"base64" } }Zstd
Zstd - Distribution
Expected distribution values:
[1, 2, 3].{ "name":"d:transactions/foo@none", "org_id":0, "project_id":42, "retention_days":90, "tags":{}, "timestamp":1712219148, "type":"d", "value":{ "data":"KLUv/QBYrQAAcAAA8D8AQAAAAAAAAAhAAgBgRgCw", "format":"zstd" } }Zstd - Set
Expected set values:
{1, 7}.{ "name":"s:transactions/bar@none", "org_id":0, "project_id":42, "retention_days":90, "tags":{}, "timestamp":1712219148, "type":"s", "value":{ "data":"KLUv/QBYQQAAAQAAAAcAAAA=", "format":"zstd" } }Rollout Plan (for each Milestone)
custom, end withtransactions).customend withtransactions(see Step 2).Rollback
The option to stop Relay from sending the new format can be immediately rolled back, Relay will stop sending the new format within 10 seconds.