-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(chunks-inspect): support structured metadata #11506
base: main
Are you sure you want to change the base?
Conversation
Trivy scan found the following vulnerabilities:
|
cmd/chunks-inspect/loki.go
Outdated
c, _ := chunkenc.NewByteChunk(data, 0, 0) | ||
bs := c.Blocks(from, through) | ||
fmt.Println("Blocks: ", len(bs)) | ||
|
||
pipeline := log.NewNoopPipeline() | ||
for idx, block := range bs { | ||
fmt.Println("Block : ", idx) | ||
fmt.Println("MinTime: ", time.Unix(0, block.MinTime()).UTC()) | ||
fmt.Println("MaxTime: ", time.Unix(0, block.MaxTime()).UTC()) | ||
fmt.Println("Offset : ", block.Offset()) | ||
iter := block.Iterator(context.Background(), pipeline.ForStream(nil)) | ||
for iter.Next() { | ||
e := iter.Entry() | ||
fmt.Println(e.Timestamp.UTC(), " ", e.Line) | ||
for _, meta := range e.StructuredMetadata { | ||
fmt.Println("\t", meta.Name, "=", meta.Value) | ||
} | ||
} | ||
} | ||
|
||
err := c.Close() | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a reasonable change, but looking at the structure of the rest of the code I think printing of all these details would be done in main.go
, it looks to me like process just builds the details of the chunk into something that main.go
prints
I haven't been keeping up with structured metadata. If it's separate from a chunk then imo we should either have a separate function to gather and return the structured metadata to main.go
or an additional return value in parseLokiChunk
for the metadata, perhaps with a parameter to choose whether or not you want to receive info about the structured metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @cstyan for your feedback! I will restructure the code and go forward.
79694b3
to
2423fba
Compare
2423fba
to
d803d44
Compare
Hi @cstyan! I've moved the output to the |
I can do another review sometime this week 👍 for now it looks like there's some linting failures
|
Thanks a lot @cstyan. I've removed temporarily the option to store the binary (raw) block data. I suggest to add an option to save the original log data to file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at the diff it seems like there's a lot of code removal that doesn't seem relevant to the change you're wanting to make?
cmd/chunks-inspect/loki.go
Outdated
"encoding/binary" | ||
"fmt" | ||
"hash/crc32" | ||
"github.com/grafana/loki/pkg/chunkenc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also have a lint rule that will say that this needs to be at the bottom of the import list with a newline in between it and the stdlib imports.
There's still some lint failures
I think those are viewable if you click |
I'm sorry @cstyan. Will check and fix it hopefully in a couple of days. |
4f6dae9
to
9006fc1
Compare
Hello @cstyan, I fixed the lint issues. The I also had a look into reading the header of the chunk with existing implementation but did not found the piece of code yet. I will try with the local storage client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we just need to merge the latest main into your branch and then we're good to go
9006fc1
to
a471ec9
Compare
Sounds great @cstyan. I re-based the commit to latest main and updated the commit message. |
sorry @agebhar1 it looks like there was a CI change merged yesterday afternoon NA time so I think we need one more rebase on top of main 🙇 |
a471ec9
to
ad92220
Compare
No worries @cstyan. I did the rebase. |
Thanks for your patience @agebhar1, the build is green but one last thing, I don't think we should be removing the |
Hi @cstyan, if I try to add add the module github.com/grafana/loki/cmd/chunks-inspect
go 1.20 and run $ go mod tidy
go: finding module for package github.com/grafana/loki/pkg/logql/log
go: finding module for package github.com/golang/snappy
go: finding module for package github.com/grafana/loki/pkg/chunkenc
go: found github.com/golang/snappy in github.com/golang/snappy v0.0.4
go: found github.com/grafana/loki/pkg/chunkenc in github.com/grafana/loki v1.6.1
go: finding module for package github.com/grafana/loki/pkg/logql/log
go: github.com/grafana/loki/cmd/chunks-inspect imports
github.com/grafana/loki/pkg/logql/log: module github.com/grafana/loki@latest found (v1.6.1), but does not contain package github.com/grafana/loki/pkg/logql/log All other (loki, logcli, ...) usees the top level module description. |
ad92220
to
dd0bb42
Compare
Hi @cstyan - I've re-added the |
We use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by what seems like removal of one feature but inclusion of another (file output) but both seem unrelated to your original goal of being able to inspect a chunk and see the structured metadata in the output.
cmd/chunks-inspect/main.go
Outdated
@@ -17,28 +21,22 @@ var timezone = time.UTC | |||
func main() { | |||
blocks := flag.Bool("b", false, "print block details") | |||
lines := flag.Bool("l", false, "print log lines") | |||
storeBlocks := flag.Bool("s", false, "store blocks, using input filename, and appending block index to it") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this command option removed intentionally? if it's just a rename of storeBlocks
-> export
we should keep it as storeBlocks
since we're explicitly creating loki blocks, but in reality it looks like we've removed the export to block functionality but added block -> plaintext file support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was intentionally, because the raw block data is not available by the Block
interface, which is returned by the Chunk
interface implementation of Blocks for MemChunk
.
The Block interface:
loki/pkg/chunkenc/interface.go
Lines 154 to 168 in 88c6711
// Block is a chunk block. | |
type Block interface { | |
// MinTime is the minimum time of entries in the block | |
MinTime() int64 | |
// MaxTime is the maximum time of entries in the block | |
MaxTime() int64 | |
// Offset is the offset/position of the block in the chunk. Offset is unique for a given block per chunk. | |
Offset() int | |
// Entries is the amount of entries in the block. | |
Entries() int | |
// Iterator returns an entry iterator for the block. | |
Iterator(ctx context.Context, pipeline log.StreamPipeline) iter.EntryIterator | |
// SampleIterator returns a sample iterator for the block. | |
SampleIterator(ctx context.Context, extractor log.StreamSampleExtractor) iter.SampleIterator | |
} |
The export of the raw block data was removed. But as of writing, it could be possible to implement this by using the offset information from the Block
interface. WDYT?
I added the export for the original ingested log data which might by helpful in case of recovery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should keep the existing functionality unless it's really not possible to. Even if it means keeping some duplicated code, lets do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will give it a try.
I did some screenshots to compare the output from the current implementation w/ the one from this PR. Chunk format v3 w/ single block (1) Checksum is not available Chunk format v3 w/ multiple blocks (4) Block entry un/compressed size is not available by any method Chunk format v4 w/ single block (6) Checksum calculation is broken due to v4 memory layout change |
dd0bb42
to
b5fce50
Compare
Sorry, I'm not sure what I'm looking at here? The changes in this PR means we can no longer parse and print v3 or v4 chunks as we previously could? |
b5fce50
to
d65a7c4
Compare
Hi @cstyan! The export of compressed/uncompressed data (-s) is back by copying a bit of the internal code. The output for a v4 chunk contains now the structured metadata per line: $ ./chunks-inspect -b -l MThjODE0NzYwY2I6MThjODE0NzYwY2M6N2QzYjVmYzU=
Chunks file: MThjODE0NzYwY2I6MThjODE0NzYwY2M6N2QzYjVmYzU=
Metadata length: 257
Data length: 259
UserID: fake
From: 2023-12-19 08:53:07.915000 UTC
Through: 2023-12-19 08:53:07.916000 UTC (1ms)
Labels:
__name__ = logs
filename = /mnt/log/structured_metadata.log
foo = bar
job = various
stream = stderr
Format (Version): 4
Encoding: snappy
Found 1 block(s)
Minimum time (from first block): 2023-12-19 08:53:07.915170 UTC
Maximum time (from last block): 2023-12-19 08:53:07.915170 UTC
Block 0: position: 50, original length: 145, minT: 2023-12-19 08:53:07.915170 UTC maxT: 2023-12-19 08:53:07.915170 UTC
2023-12-19 08:53:07.915170 UTC { "log": "log message\n", "stream": "stderr", "traceID": "0242ac120002", "time": "2019-04-30T02:12:41.8443515Z" }
Structured Metadata:
traceID = 0242ac120002
Total chunk size of uncompressed data: 140 compressed data: 259 ratio: 0.541 |
@agebhar1 looks like you have some linting failures I'll let @slim-bean review the chunk parsing changes since he opened that other PR as well. |
@cstyan will fix it |
d65a7c4
to
d7ae1b9
Compare
I fixed the lint error @cstyan. |
What this PR does / why we need it:
Support chunk format v4 (aka structured metadata) for
chunks-inspect
.Which issue(s) this PR fixes:
Fixes #10767
Special notes for your reviewer:
Checklist
CONTRIBUTING.md
guide (required)CHANGELOG.md
updatedadd-to-release-notes
labeldocs/sources/setup/upgrade/_index.md
production/helm/loki/Chart.yaml
and updateproduction/helm/loki/CHANGELOG.md
andproduction/helm/loki/README.md
. Example PRdeprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR