Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(chunks-inspect): support structured metadata #11506

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

agebhar1
Copy link
Contributor

What this PR does / why we need it:

Support chunk format v4 (aka structured metadata) for chunks-inspect.

Which issue(s) this PR fixes:
Fixes #10767

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@agebhar1 agebhar1 requested a review from a team as a code owner December 18, 2023 09:16
Copy link
Contributor

Trivy scan found the following vulnerabilities:

  • HIGH, Target: docker.io/grafana/loki:main-724a841 (alpine 3.18.4), Type: alpine openssl: Incorrect cipher key and IV length processing in libcrypto3 v3.1.3-r0. Fixed in v3.1.4-r0
  • HIGH, Target: docker.io/grafana/loki:main-724a841 (alpine 3.18.4), Type: alpine openssl: Incorrect cipher key and IV length processing in libssl3 v3.1.3-r0. Fixed in v3.1.4-r0
    \nTo see more details on these vulnerabilities, and how/where to fix them, please run docker build -t grafana/loki:main-724a841 -f cmd/loki/Dockerfile .
    trivy i grafana/loki:main-724a841 on your branch. If these were not introduced by your PR, please considering fixing them in via a subsequent PR. Thanks!

Comment on lines 129 to 71
c, _ := chunkenc.NewByteChunk(data, 0, 0)
bs := c.Blocks(from, through)
fmt.Println("Blocks: ", len(bs))

pipeline := log.NewNoopPipeline()
for idx, block := range bs {
fmt.Println("Block : ", idx)
fmt.Println("MinTime: ", time.Unix(0, block.MinTime()).UTC())
fmt.Println("MaxTime: ", time.Unix(0, block.MaxTime()).UTC())
fmt.Println("Offset : ", block.Offset())
iter := block.Iterator(context.Background(), pipeline.ForStream(nil))
for iter.Next() {
e := iter.Entry()
fmt.Println(e.Timestamp.UTC(), " ", e.Line)
for _, meta := range e.StructuredMetadata {
fmt.Println("\t", meta.Name, "=", meta.Value)
}
}
}

err := c.Close()
if err != nil {
return nil, err
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a reasonable change, but looking at the structure of the rest of the code I think printing of all these details would be done in main.go, it looks to me like process just builds the details of the chunk into something that main.go prints

I haven't been keeping up with structured metadata. If it's separate from a chunk then imo we should either have a separate function to gather and return the structured metadata to main.go or an additional return value in parseLokiChunk for the metadata, perhaps with a parameter to choose whether or not you want to receive info about the structured metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @cstyan for your feedback! I will restructure the code and go forward.

@agebhar1 agebhar1 force-pushed the feature/gh10767-chunks-inspect branch 2 times, most recently from 79694b3 to 2423fba Compare February 4, 2024 18:25
@pull-request-size pull-request-size bot added size/L and removed size/M labels Feb 4, 2024
@agebhar1 agebhar1 force-pushed the feature/gh10767-chunks-inspect branch from 2423fba to d803d44 Compare February 4, 2024 18:30
@agebhar1
Copy link
Contributor Author

agebhar1 commented Feb 4, 2024

Hi @cstyan! I've moved the output to the printFile function. The parseLokiChunk now returns only the chunk data. Is this the right direction?

@cstyan
Copy link
Contributor

cstyan commented Feb 6, 2024

I can do another review sometime this week 👍 for now it looks like there's some linting failures

cmd/chunks-inspect/main.go:30:59: unused-parameter: parameter 'storeBlocks' seems to be unused, consider removing or renaming it as _ (revive)
func printFile(filename string, blockDetails, printLines, storeBlocks bool) {
                                                          ^
cmd/chunks-inspect/time.go:37:36: unnecessary conversion (unconvert)
		v, err := strconv.ParseInt(string(p[0]), 10, 64)
		                                 ^
cmd/chunks-inspect/time.go:44:36: unnecessary conversion (unconvert)
		v, err := strconv.ParseInt(string(p[0]), 10, 64)

@agebhar1
Copy link
Contributor Author

agebhar1 commented Feb 7, 2024

Thanks a lot @cstyan.

I've removed temporarily the option to store the binary (raw) block data. I suggest to add an option to save the original log data to file.

Copy link
Contributor

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the diff it seems like there's a lot of code removal that doesn't seem relevant to the change you're wanting to make?

"encoding/binary"
"fmt"
"hash/crc32"
"github.com/grafana/loki/pkg/chunkenc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also have a lint rule that will say that this needs to be at the bottom of the import list with a newline in between it and the stdlib imports.

@cstyan
Copy link
Contributor

cstyan commented Feb 7, 2024

There's still some lint failures

cmd/chunks-inspect/header.go:1: : # github.com/grafana/loki/cmd/chunks-inspect
cmd/chunks-inspect/loki.go:11:19: undefined: chunkenc
cmd/chunks-inspect/loki.go:14:21: undefined: chunkenc
cmd/chunks-inspect/loki.go:45:10: undefined: chunkenc
cmd/chunks-inspect/main.go:73:14: undefined: logql (typecheck)

I think those are viewable if you click details on the checks item here on github?

@agebhar1
Copy link
Contributor Author

agebhar1 commented Feb 8, 2024

I'm sorry @cstyan. Will check and fix it hopefully in a couple of days.

@agebhar1 agebhar1 force-pushed the feature/gh10767-chunks-inspect branch 2 times, most recently from 4f6dae9 to 9006fc1 Compare February 10, 2024 18:00
@agebhar1
Copy link
Contributor Author

Hello @cstyan,

I fixed the lint issues. The chunk-inspect tries to use as much as possible from the existing implementation. This should improve it's maintainability for added features like structured metadata. Reusing the implementation comes with some downsides such as the checksum for the blocks are not known anymore and also the raw block data. Instead of saving the block data it's now possible to export the original log data.

I also had a look into reading the header of the chunk with existing implementation but did not found the piece of code yet. I will try with the local storage client.

Copy link
Contributor

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just need to merge the latest main into your branch and then we're good to go

@agebhar1 agebhar1 force-pushed the feature/gh10767-chunks-inspect branch from 9006fc1 to a471ec9 Compare February 20, 2024 17:30
@agebhar1
Copy link
Contributor Author

Sounds great @cstyan. I re-based the commit to latest main and updated the commit message.

@agebhar1 agebhar1 changed the title [WIP] feat(chunks-inspect): support structured metadata feat(chunks-inspect): support structured metadata Feb 20, 2024
@cstyan
Copy link
Contributor

cstyan commented Feb 21, 2024

sorry @agebhar1 it looks like there was a CI change merged yesterday afternoon NA time so I think we need one more rebase on top of main 🙇

@agebhar1 agebhar1 force-pushed the feature/gh10767-chunks-inspect branch from a471ec9 to ad92220 Compare February 22, 2024 06:23
@agebhar1
Copy link
Contributor Author

No worries @cstyan. I did the rebase.

@cstyan
Copy link
Contributor

cstyan commented Feb 23, 2024

Thanks for your patience @agebhar1, the build is green but one last thing, I don't think we should be removing the go.sum and go.mod for chunks-inspect. Could you add those back/update the go.mod to use a go 1.20 version?

@agebhar1
Copy link
Contributor Author

Hi @cstyan, if I try to add add the go.mod to the directory cmd/chunks-inspect with

module github.com/grafana/loki/cmd/chunks-inspect

go 1.20

and run go mod tidy I get the following error:

$ go mod tidy
go: finding module for package github.com/grafana/loki/pkg/logql/log
go: finding module for package github.com/golang/snappy
go: finding module for package github.com/grafana/loki/pkg/chunkenc
go: found github.com/golang/snappy in github.com/golang/snappy v0.0.4
go: found github.com/grafana/loki/pkg/chunkenc in github.com/grafana/loki v1.6.1
go: finding module for package github.com/grafana/loki/pkg/logql/log
go: github.com/grafana/loki/cmd/chunks-inspect imports
        github.com/grafana/loki/pkg/logql/log: module github.com/grafana/loki@latest found (v1.6.1), but does not contain package github.com/grafana/loki/pkg/logql/log

All other (loki, logcli, ...) usees the top level module description.

@agebhar1 agebhar1 force-pushed the feature/gh10767-chunks-inspect branch from ad92220 to dd0bb42 Compare May 16, 2024 17:02
@agebhar1
Copy link
Contributor Author

Hi @cstyan - I've re-added the go.mod with the current go version which is used by loki itself.

@cstyan
Copy link
Contributor

cstyan commented May 16, 2024

We use go 1.22.2 for loki now but that's not immediately clear when looking at some files in the repo, we could make that update for the chunks-inspect tool later. I'll do a review here today/tomorrow, thanks for your patience @agebhar1 .

Copy link
Contributor

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by what seems like removal of one feature but inclusion of another (file output) but both seem unrelated to your original goal of being able to inspect a chunk and see the structured metadata in the output.

@@ -17,28 +21,22 @@ var timezone = time.UTC
func main() {
blocks := flag.Bool("b", false, "print block details")
lines := flag.Bool("l", false, "print log lines")
storeBlocks := flag.Bool("s", false, "store blocks, using input filename, and appending block index to it")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this command option removed intentionally? if it's just a rename of storeBlocks -> export we should keep it as storeBlocks since we're explicitly creating loki blocks, but in reality it looks like we've removed the export to block functionality but added block -> plaintext file support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was intentionally, because the raw block data is not available by the Block interface, which is returned by the Chunk interface implementation of Blocks for MemChunk.

The Block interface:

// Block is a chunk block.
type Block interface {
// MinTime is the minimum time of entries in the block
MinTime() int64
// MaxTime is the maximum time of entries in the block
MaxTime() int64
// Offset is the offset/position of the block in the chunk. Offset is unique for a given block per chunk.
Offset() int
// Entries is the amount of entries in the block.
Entries() int
// Iterator returns an entry iterator for the block.
Iterator(ctx context.Context, pipeline log.StreamPipeline) iter.EntryIterator
// SampleIterator returns a sample iterator for the block.
SampleIterator(ctx context.Context, extractor log.StreamSampleExtractor) iter.SampleIterator
}

The export of the raw block data was removed. But as of writing, it could be possible to implement this by using the offset information from the Block interface. WDYT?

I added the export for the original ingested log data which might by helpful in case of recovery.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep the existing functionality unless it's really not possible to. Even if it means keeping some duplicated code, lets do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will give it a try.

cmd/chunks-inspect/main.go Outdated Show resolved Hide resolved
@agebhar1
Copy link
Contributor Author

I did some screenshots to compare the output from the current implementation w/ the one from this PR.

Chunk format v3 w/ single block
chunks-inspect v3 a

(1) Checksum is not available
(2) Was removed but I can't remember why and can be restored
(3) Changed to the chunk data itself (without metadata and header), both are available by UncompressedSize and CompressedSize

Chunk format v3 w/ multiple blocks
chunks-inspect v3 b

(4) Block entry un/compressed size is not available by any method
(5) same for checksum

Chunk format v4 w/ single block
chunks-inspect v4

(6) Checksum calculation is broken due to v4 memory layout change
(7) Log data is not shown due to changed memory layout
(8) PR prints structured metadata per log entry

@agebhar1 agebhar1 force-pushed the feature/gh10767-chunks-inspect branch from dd0bb42 to b5fce50 Compare May 18, 2024 17:40
@cstyan
Copy link
Contributor

cstyan commented May 21, 2024

I did some screenshots to compare the output from the current implementation w/ the one from this PR.

Sorry, I'm not sure what I'm looking at here?

The changes in this PR means we can no longer parse and print v3 or v4 chunks as we previously could?

@agebhar1 agebhar1 force-pushed the feature/gh10767-chunks-inspect branch from b5fce50 to d65a7c4 Compare June 1, 2024 14:01
@agebhar1
Copy link
Contributor Author

agebhar1 commented Jun 1, 2024

Hi @cstyan! The export of compressed/uncompressed data (-s) is back by copying a bit of the internal code. The output for a v4 chunk contains now the structured metadata per line:

$ ./chunks-inspect -b -l MThjODE0NzYwY2I6MThjODE0NzYwY2M6N2QzYjVmYzU=
Chunks file: MThjODE0NzYwY2I6MThjODE0NzYwY2M6N2QzYjVmYzU=
Metadata length: 257
Data length: 259
UserID: fake
From: 2023-12-19 08:53:07.915000 UTC
Through: 2023-12-19 08:53:07.916000 UTC (1ms)
Labels:
         __name__ = logs
         filename = /mnt/log/structured_metadata.log
         foo = bar
         job = various
         stream = stderr
Format (Version): 4
Encoding: snappy
Found 1 block(s)
Minimum time (from first block): 2023-12-19 08:53:07.915170 UTC
Maximum time (from last block): 2023-12-19 08:53:07.915170 UTC

Block    0: position:       50, original length:    145, minT: 2023-12-19 08:53:07.915170 UTC maxT: 2023-12-19 08:53:07.915170 UTC
2023-12-19 08:53:07.915170 UTC  { "log": "log message\n", "stream": "stderr", "traceID": "0242ac120002", "time": "2019-04-30T02:12:41.8443515Z" }
Structured Metadata:
         traceID = 0242ac120002
Total chunk size of uncompressed data: 140 compressed data: 259 ratio: 0.541

@cstyan
Copy link
Contributor

cstyan commented Jun 5, 2024

@agebhar1 looks like you have some linting failures

I'll let @slim-bean review the chunk parsing changes since he opened that other PR as well.

@agebhar1
Copy link
Contributor Author

agebhar1 commented Jun 6, 2024

@cstyan will fix it

@agebhar1 agebhar1 force-pushed the feature/gh10767-chunks-inspect branch from d65a7c4 to d7ae1b9 Compare June 6, 2024 16:23
@agebhar1
Copy link
Contributor Author

agebhar1 commented Jun 6, 2024

I fixed the lint error @cstyan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add structured_metadata support to the output of chunks-inspect cli
2 participants