-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flag categorize labels on streams response #10419
Conversation
d67e257
to
b3cff8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks brilliant 💎 . great job @salvacorts
@@ -710,6 +711,272 @@ func TestQueryTSDB_WithCachedPostings(t *testing.T) { | |||
|
|||
} | |||
|
|||
func TestCategorizedLabels(t *testing.T) { | |||
clu := cluster.New(nil, cluster.SchemaWithTDSBAndNonIndexedLabels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's rename it to SchemaWithTDSBAndStructuredMetadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea, but I'd personally prefer something that's not a header to determine version (version
parameter in query string|body, different api version prefix in path), but that's a nit.
I also think it makes more sense and will be more efficient for us to not treat each stream distinct based on (labels, structuredMetadata, parsed)
, but rather labels
alone. It also feels more semantically correct as different log lines with separate structuredMetadata or parsed labels are still part of the same stream.
For instance:
{
"stream": {
"labels": {
"cluster": "us-central",
"namespace": "loki",
"container": "query-frontend",
},
"structuredMetadata": {
"traceID": "68810cf0c94bfcca"
},
"parsed": {
"level": "info"
}
},
"values": [...]
}
should become
{
"stream": {
"cluster": "us-central",
"namespace": "loki",
"container": "query-frontend",
},
"values": [
{
"ts": 0,
"entry": "entry0",
"structuredMetadata": {
"traceID": "68810cf0c94bfcca"
},
"parsed": {
"level": "info"
}
},
...
]
}
There are a few places in codecs, response merging (iirc) that will benefit from the reduced number of streams this will produces. Since it also feels more semantically correct, I think it's the way to go.
Would it not mean a difference in behaviour between metric and log queries? I can't think of a way to do a similar categorization of labels in metric queries. I think stream should have all the labels merged together. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work Salva! I have left some minor suggestions. I would like to do another review once those are taken care of.
integration/cluster/schema.go
Outdated
tsdbShipperSchemaWithStructuredMetadataConfigTemplate = ` | ||
schema_config: | ||
configs: | ||
- from: {{.curPeriodStart}} | ||
store: tsdb | ||
object_store: filesystem | ||
schema: v13 | ||
index: | ||
prefix: index_ | ||
period: 24h | ||
` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need not add this, you can just do:
clu := cluster.New(nil, cluster.SchemaWithTSDB, func(c *cluster.Cluster) {
c.SetSchemaVer("v13")
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thank you!
now = time.Now() | ||
require.NoError(t, cliDistributor.PushLogLineWithTimestamp("lineA", now.Add(-1*time.Second), map[string]string{"job": "fake"})) | ||
require.NoError(t, cliDistributor.PushLogLineWithTimestampAndStructuredMetadata("lineB", now.Add(-2*time.Second), map[string]string{"traceID": "123", "user": "a"}, map[string]string{"job": "fake"})) | ||
require.NoError(t, tIngester.Restart()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you are restarting ingester to test your changes with data served from queriers, I think you need to disable WAL as well. You can do it by setting "-ingester.wal-enabled=false"
flag on the ingesters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thank you!
pkg/logql/log/labels.go
Outdated
return cached | ||
} | ||
|
||
return NewLabelsResult(b.buf.String(), hash, stream, structuredMetadata, parsed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are not filling back the cache here, is it intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Thank you!
pkg/logql/log/labels_test.go
Outdated
// cached. | ||
assertLabelResult(t, expected, b.LabelsResult()) | ||
assertLabelResult(t, expected, actual) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't look like it would test the cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, thank you.
repeated LabelPairAdapter parsed = 4 [ | ||
(gogoproto.nullable) = false, | ||
(gogoproto.jsontag) = "parsed,omitempty" | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to avoid doing this change since it adds an unexpected field to the push payload. I know it is hard but maybe leave a todo and let us try fixing it in a separate PR to avoid overcomplicating this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to avoid doing this chang
Likewise! The problem is this is the object we move around everywhere an throughout our components so we need to send it around somewhere.
it adds an unexpected field to the push payload
I don't think that's a problem. Protobuf is extensible and backwards compatible when adding fields to the end of an object definition. So in this case, it's still backwards compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not worried about backwards compatibility, my ony concern is having an unwanted field on the proto definitions on the write path. I understand it is hard to do it so maybe leave a todo for now.
@@ -310,16 +346,30 @@ func encodeStream(stream logproto.Stream, s *jsoniter.Stream) error { | |||
s.WriteRaw(`"`) | |||
s.WriteMore() | |||
s.WriteStringWithHTMLEscaped(e.Line) | |||
if len(e.StructuredMetadata) > 0 { | |||
|
|||
if categorizeLabels && (len(e.StructuredMetadata) > 0 || len(e.Parsed) > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for consistency we should write empty object({}
) instead of skipping it when we have not structured metadata and parsed labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean always serializing:
{
"structuredMetadata": {},
"parsed": {}
}
If so, I think it's better not to do it. It will add a lot of overhead to the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviving this conversation due to @svennergr expecting {}
to be returned here.
I'll create a follow-up PR to return a third item {} when there is no structured metadata nor parsed labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the changes look great. Left just a minor comments and then we are good to merge it.
if err := jsonparser.ObjectEach(value, func(key []byte, value []byte, dataType jsonparser.ValueType, _ int) error { | ||
if dataType != jsonparser.String { | ||
return jsonparser.MalformedStringError | ||
if dataType == jsonparser.Object { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please leave a comment saying we parse both push request and query response json here to make it clear for others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
builder := labels.NewBuilder(lbls) | ||
for _, label := range entry.StructuredMetadata { | ||
builder.Del(label.Name) | ||
} | ||
for _, label := range entry.Parsed { | ||
builder.Del(label.Name) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is on a hot path, it would be good to avoid doing this and try building the final response in the iterators. Maybe leave a todo and try doing it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a todo
repeated LabelPairAdapter parsed = 4 [ | ||
(gogoproto.nullable) = false, | ||
(gogoproto.jsontag) = "parsed,omitempty" | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not worried about backwards compatibility, my ony concern is having an unwanted field on the proto definitions on the write path. I understand it is hard to do it so maybe leave a todo for now.
pkg/logql/log/labels.go
Outdated
hash := b.hasher.Hash(buf) | ||
if cached, ok := b.resultCache[hash]; ok { | ||
return cached | ||
} | ||
res := NewLabelsResult(buf.Copy(), hash) | ||
|
||
res := NewLabelsResult(buf.String(), hash, buf, nil, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be doing a copy of the buf since it is reused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
**What this PR does / why we need it**: This is a follow-up PR for #10419 adding support for tailing. I tested it on a dev cell and works fine. <img width="1296" alt="image" src="https://github.com/grafana/loki/assets/8354290/6177e0ca-02ce-48cd-a17f-0739dc3caa0a"> **Note**: With these changes, the JSON marshal unmarshal functions for the tail are no longer used ([example][1]) so I think we can remove them. Also, the new Tail response is no longer used, so we can also make it an alias to the _legacy_ one. Let's do it on a follow-up PR to avoid making this one bigger. [1]: https://github.com/grafana/loki/blob/52a3f16039dd5ff655fc3681257d99794f620ec4/pkg/loghttp/entry.go#L210-L238
…red metadata labels (#11325) This is a followup PR for #10419 wrt this comment #10419 (comment). If there is no parsed nor structured-metadata labels in a given entry, and the categorize-labels header is set, add an empty object to the entry array. For example, we currently return: ```json { "data": { "encodingFlags": [ "categorize-labels" ], "result": [ { "stream": { "agent": "promtail", "filename": "/var/log/nginx/json_access.log", "host": "appfelstrudel", "job": "nginx_access_log" }, "values": [ [ "1701096036028751750", "example line" ] ] } ] } } ``` But we want: ```json { "data": { "encodingFlags": [ "categorize-labels" ], "result": [ { "stream": { "agent": "promtail", "filename": "/var/log/nginx/json_access.log", "host": "appfelstrudel", "job": "nginx_access_log" }, "values": [ [ "1701096036028751750", "example line", {} <<<--- WE WANT THIS ] ] } ] } } ```
…red metadata labels (#11325) This is a followup PR for #10419 wrt this comment #10419 (comment). If there is no parsed nor structured-metadata labels in a given entry, and the categorize-labels header is set, add an empty object to the entry array. For example, we currently return: ```json { "data": { "encodingFlags": [ "categorize-labels" ], "result": [ { "stream": { "agent": "promtail", "filename": "/var/log/nginx/json_access.log", "host": "appfelstrudel", "job": "nginx_access_log" }, "values": [ [ "1701096036028751750", "example line" ] ] } ] } } ``` But we want: ```json { "data": { "encodingFlags": [ "categorize-labels" ], "result": [ { "stream": { "agent": "promtail", "filename": "/var/log/nginx/json_access.log", "host": "appfelstrudel", "job": "nginx_access_log" }, "values": [ [ "1701096036028751750", "example line", {} <<<--- WE WANT THIS ] ] } ] } } ``` (cherry picked from commit 10210b8)
…structured metadata labels (#11331) Backport 10210b8 from #11325 --- This is a followup PR for #10419 wrt this comment #10419 (comment). If there is no parsed nor structured-metadata labels in a given entry, and the categorize-labels header is set, add an empty object to the entry array. For example, we currently return: ```json { "data": { "encodingFlags": [ "categorize-labels" ], "result": [ { "stream": { "agent": "promtail", "filename": "/var/log/nginx/json_access.log", "host": "appfelstrudel", "job": "nginx_access_log" }, "values": [ [ "1701096036028751750", "example line" ] ] } ] } } ``` But we want: ```json { "data": { "encodingFlags": [ "categorize-labels" ], "result": [ { "stream": { "agent": "promtail", "filename": "/var/log/nginx/json_access.log", "host": "appfelstrudel", "job": "nginx_access_log" }, "values": [ [ "1701096036028751750", "example line", {} <<<--- WE WANT THIS ] ] } ] } } ``` Co-authored-by: Salva Corts <salva.corts@grafana.com>
We recently introduced support for ingesting and querying structured metadata in Loki. This adds a new dimension to Loki's labels since now we arguably have three categories of labels: _stream_, _structured metadata_, and _parsed_ labels. Depending on the origin of the labels, they should be used in LogQL expressions differently to achieve optimal performance. _stream_ labels should be added to stream matchers, _structured metadata_ labels should be used in a filter expression before any parsing expression, and _parsed_ labels should be placed after the parser expression extracting them. The Grafana UI has a hard time dealing with this same problem. Before grafana/grafana#73955, the filtering functionality in Grafana was broken since it was not able to distinguish between _stream_ and _structured metadata_ labels. Also, as soon as a parser expression was added to the query, filters added by Grafana would be appended to the end of the query regardless of the label category. The PR above implements a workaround for this problem but needs a better API on Loki's end to mitigate all corner cases. Loki currently returns the following JSON for log queries: ```json ... { "stream": { "cluster": "us-central", "container": "query-frontend", "namespace": "loki", "level": "info", "traceID": "68810cf0c94bfcca" }, "values": [ [ "1693996529000222496", "1693996529000222496 aaaaaaaaa.....\n" ], ... }, { "stream": { "cluster": "us-central", "container": "query-frontend", "namespace": "loki", "level": "debug", "traceID": "a7116cj54c4bjz8s" }, "values": [ [ "1693996529000222497", "1693996529000222497 bbbbbbbbb.....\n" ], ... }, ... ``` As can be seen, there is no way we can distinguish the category of each label. This PR introduces a new flag `X-Loki-Response-Encoding-Flags: categorize-labels` that makes Loki return categorized labels as follows: ```json ... { "stream": { "cluster": "us-central", "container": "query-frontend", "namespace": "loki", }, "values": [ [ "1693996529000222496", "1693996529000222496 aaaaaaaaa.....\n", { "structuredMetadata": { "traceID": "68810cf0c94bfcca" }, "parsed": { "level": "info" } } ], [ "1693996529000222497", "1693996529000222497 bbbbbbbbb.....\n", { "structuredMetadata": { "traceID": "a7116cj54c4bjz8s" }, "parsed": { "level": "debug" } } ], ... }, ... ``` Note that this PR only supports log queries, not metric queries. From a UX perspective, being able to categorize labels in metric queries doesn't have any benefit yet. Having said that, supporting this for metric queries would require some minor refactoring on top of what has been implemented here. If we decide to do that, I think we should do it on a separate PR to avoid making this PR even larger. I also decided to leave out support for Tail queries to avoid making this PR even larger. Once this one gets merged, we can work to support tailing. --- **Note to reviewers** This PR is huge since we need to forward categorized all over the codebase (from parsing logs all the way to marshaling), fortunately, many of the changes come from updating tests and refactoring iterators. Tested out in a dev cell with query `'{stream="stdout"} | label_format new="text"`. - Without the new flag: ``` $ http http://127.0.0.1:3100/loki/api/v1/query_range\?direction\=BACKWARD\&end\=1693996529322486000\&limit\=30\&query\=%7Bstream%3D%22stdout%22%7D+%7C+label_format+new%3D%22text%22\&start\=1693992929322486000 X-Scope-Orgid:REDACTED { "data": { "result": [ { "stream": { "new": "text", "pod": "loki-canary-986bd6f4b-xqmb7", "stream": "stdout" }, "values": [ [ "1693996529000222496", "1693996529000222496 pppppppppppp...\n" ], [ "1693996528499160852", "1693996528499160852 pppppppppppp...\n" ], ... ``` - With the new flag ``` $ http http://127.0.0.1:3100/loki/api/v1/query_range\?direction\=BACKWARD\&end\=1693996529322486000\&limit\=30\&query\=%7Bstream%3D%22stdout%22%7D+%7C+label_format+new%3D%22text%22\&start\=1693992929322486000 X-Scope-Orgid:REDACTED X-Loki-Response-Encoding-Flags:categorize-labels { "data": { "encodingFlags": [ "categorize-labels" ], "result": [ { "stream": { "pod": "loki-canary-986bd6f4b-xqmb7", "stream": "stdout" }, "values": [ [ "1693996529000222496", "1693996529000222496 pppppppppppp...\n", { "parsed": { "new": "text" } } ], [ "1693996528499160852", "1693996528499160852 pppppppppppp...\n", { "parsed": { "new": "text" } } ], ... ```
**What this PR does / why we need it**: This is a follow-up PR for grafana#10419 adding support for tailing. I tested it on a dev cell and works fine. <img width="1296" alt="image" src="https://github.com/grafana/loki/assets/8354290/6177e0ca-02ce-48cd-a17f-0739dc3caa0a"> **Note**: With these changes, the JSON marshal unmarshal functions for the tail are no longer used ([example][1]) so I think we can remove them. Also, the new Tail response is no longer used, so we can also make it an alias to the _legacy_ one. Let's do it on a follow-up PR to avoid making this one bigger. [1]: https://github.com/grafana/loki/blob/52a3f16039dd5ff655fc3681257d99794f620ec4/pkg/loghttp/entry.go#L210-L238
…red metadata labels (grafana#11325) This is a followup PR for grafana#10419 wrt this comment grafana#10419 (comment). If there is no parsed nor structured-metadata labels in a given entry, and the categorize-labels header is set, add an empty object to the entry array. For example, we currently return: ```json { "data": { "encodingFlags": [ "categorize-labels" ], "result": [ { "stream": { "agent": "promtail", "filename": "/var/log/nginx/json_access.log", "host": "appfelstrudel", "job": "nginx_access_log" }, "values": [ [ "1701096036028751750", "example line" ] ] } ] } } ``` But we want: ```json { "data": { "encodingFlags": [ "categorize-labels" ], "result": [ { "stream": { "agent": "promtail", "filename": "/var/log/nginx/json_access.log", "host": "appfelstrudel", "job": "nginx_access_log" }, "values": [ [ "1701096036028751750", "example line", {} <<<--- WE WANT THIS ] ] } ] } } ```
We recently introduced support for ingesting and querying structured metadata in Loki. This adds a new dimension to Loki's labels since now we arguably have three categories of labels: stream, structured metadata, and parsed labels.
Depending on the origin of the labels, they should be used in LogQL expressions differently to achieve optimal performance. stream labels should be added to stream matchers, structured metadata labels should be used in a filter expression before any parsing expression, and parsed labels should be placed after the parser expression extracting them.
The Grafana UI has a hard time dealing with this same problem. Before grafana/grafana#73955, the filtering functionality in Grafana was broken since it was not able to distinguish between stream and structured metadata labels. Also, as soon as a parser expression was added to the query, filters added by Grafana would be appended to the end of the query regardless of the label category. The PR above implements a workaround for this problem but needs a better API on Loki's end to mitigate all corner cases.
Loki currently returns the following JSON for log queries:
As can be seen, there is no way we can distinguish the category of each label.
This PR introduces a new flag
X-Loki-Response-Encoding-Flags: categorize-labels
that makes Loki return categorized labels as follows:Note that this PR only supports log queries, not metric queries. From a UX perspective, being able to categorize labels in metric queries doesn't have any benefit yet. Having said that, supporting this for metric queries would require some minor refactoring on top of what has been implemented here. If we decide to do that, I think we should do it on a separate PR to avoid making this PR even larger.
I also decided to leave out support for Tail queries to avoid making this PR even larger. Once this one gets merged, we can work to support tailing.
Note to reviewers
This PR is huge since we need to forward categorized all over the codebase (from parsing logs all the way to marshaling), fortunately, many of the changes come from updating tests and refactoring iterators.
Tested out in a dev cell with query
'{stream="stdout"} | label_format new="text"
.