Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pin chunk and index format to schema version. #10213

Merged
merged 25 commits into from Aug 24, 2023
Merged

Conversation

kavirajk
Copy link
Collaborator

@kavirajk kavirajk commented Aug 10, 2023

We pin all three Chunk, HeadBlock and TSDB Version to schema version in period config.

This is the following mapping (after being discussed with @owen-d and @sandeepsukhani )

v12 (current existing schema) - ChunkFormatV3 (UnorderedHeadBlock) + TSDBv2
v13 (introducing new schema) - ChunkFormatV4 (UnorderedWithNonIndexedLabelsHeadBlockFmt) + TSDBv3

Note the new schema v13 supports the latest chunk and index format.

NOTES for Reviewer

  1. General approach is we removed the idea of index.LiveFormat, chunkenc.DefaultChunkFormat and chunkenc.DefaultHeadBlockFmt and made following two changes. These variables were used before to tie chunk and tsdb formats specific to Loki versions. This PR remove that coupling and pin these formats to schema version instead.

    1. These variables were replaced with explicit chunk and index formats within those packages (and it's tests)
    2. If these variables were used outside it's own packages say by ingester, compactor, etc. Then we extract correct chunk and index versions from the schema config.
  2. Add two methods to periodConfig. (1) ChunkFormat() returning chunk and head format tied to schema (2) TSDBFormat() returning tsdb format tied to schema.

  3. Other ideas I thought of doing but didn't end up doing is make ChunkFormat and IndexFormat as separate type (rather than byte and int currently. Similar to HeadBlockFmt type). But didnt' do it eventually to keep the PR small and don't want to complicate with lots of changes.

  4. Moved couple of test cases from chunkenc to config package, because the test case was actually testing methods on schemaconfig and it was creating cycling dependencies.

@kavirajk kavirajk changed the title [WIP]: Tie chunk and index format to schema version. [WIP]: Pin chunk and index format to schema version. Aug 10, 2023
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
…fig`

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
@kavirajk kavirajk changed the title [WIP]: Pin chunk and index format to schema version. Pin chunk and index format to schema version. Aug 21, 2023
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
@JStickler
Copy link
Contributor

Are there any updates we need to make to the documentation when this change goes through?
We document the chunk format in https://grafana.com/docs/loki/next/get-started/architecture/.
1 - Is the current chunk format correct (IIRC there was a recent update to this section of the docs)
2 - Do we need to add the schema version number to the docs?

@kavirajk
Copy link
Collaborator Author

Is the current chunk format correct (IIRC there was a recent update to this section of the docs)

I think you mean the non-index labels changes went into ChunkV4. I think yes. @sandeepsukhani @vlad-diachenko @salvacorts worked on this.

Do we need to add the schema version number to the docs?

Yes. Schema v13 needs to be added. I will do that either in this PR or follow up PR.

pkg/chunkenc/memchunk.go Outdated Show resolved Hide resolved
pkg/ingester/checkpoint.go Outdated Show resolved Hide resolved
pkg/storage/config/schema_config.go Outdated Show resolved Hide resolved
pkg/storage/stores/tsdb/index/index.go Show resolved Hide resolved
Copy link
Contributor

@salvacorts salvacorts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! Thank you Kavi, this will ease our lives a lot when implementing and rolling out new chunk formats 🚀

I left out some minor comments.

pkg/chunkenc/memchunk.go Outdated Show resolved Hide resolved
pkg/chunkenc/memchunk_test.go Outdated Show resolved Hide resolved
pkg/chunkenc/memchunk_test.go Outdated Show resolved Hide resolved
@@ -594,7 +611,7 @@ func (s *stream) resetCounter() {

func headBlockType(unorderedWrites bool) chunkenc.HeadBlockFmt {
if unorderedWrites {
return chunkenc.DefaultHeadBlockFmt
return chunkenc.UnorderedWithNonIndexedLabelsHeadBlockFmt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it would be good to have a DefaultUnorderedHeadFmt const to use in this case and similar cases (if any) that way, we don't need to manually find and replace these occurrences but just the constant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea. Only problem is once we export such variables outside the chunkenc, people use it instead of taking it from schema config.

Besides, particularly here this unordered head fmt should also depends on either chunkv3 or chunkv4 (cannot use normal UnorderedBlockFmt with chunk4 for example).

I handled it differently by exporting ChunkHeadFormatFor(chunkfmt) api from chunkenc package.

pkg/ingester/stream_test.go Outdated Show resolved Hide resolved
pkg/storage/config/schema_config.go Outdated Show resolved Hide resolved
pkg/storage/hack/main.go Outdated Show resolved Hide resolved
pkg/storage/lazy_chunk_test.go Outdated Show resolved Hide resolved
pkg/storage/stores/tsdb/head_manager_test.go Outdated Show resolved Hide resolved
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Copy link
Contributor

@sandeepsukhani sandeepsukhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work Kavi! The changes look good to me, modulo Salva's suggestions which are yet to be resolved. While reviewing the code, I had similar concerns about having to update the tests outside of the chunkenc package to use the latest chunk format. I think we can put a LatestPeriodConfig or RecommendedPeriodConfig somewhere for the tests and use it for tests building chunks. The chunkenc package can then have most tests running against all supported chunk formats.

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
@kavirajk
Copy link
Collaborator Author

I think we can put a LatestPeriodConfig or RecommendedPeriodConfig somewhere for the tests and use it for tests building chunks. The chunkenc package can then have most tests running against all supported chunk formats.

Thanks for the review @sandeepsukhani. Agree.

Basically I tweaked tests inside chunkenc to run tests against all supported chunk formats and outside the chunkenc I tweaked tests to run against all recent schemas (v11, v12 and v13)

Copy link
Contributor

@sandeepsukhani sandeepsukhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -210,7 +210,7 @@ func (r *ingesterRecoverer) Close() {
s.unorderedWrites = isAllowed

if !isAllowed && old {
err := s.chunks[len(s.chunks)-1].chunk.ConvertHead(headBlockType(isAllowed))
err := s.chunks[len(s.chunks)-1].chunk.ConvertHead(headBlockType(s.chunkFormat, isAllowed))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

headBlockType actually doesn't make sense to me because isAllowed here is always going to be false. Anyways, not going to block the PR on this.

@kavirajk
Copy link
Collaborator Author

Will merge this once getting some reviews from @owen-d as well.

Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One fix needed then lgtm

schemaconfig *config.SchemaConfig

// precomputed `Chunk` and `Head` formats parsed from schemaconfig
chunkfmt byte
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be detected when writing a chunk within the stream, not on the instance. The instance may be exist across schema boundaries. This leaves us with two choices:

  1. set the format based on the min timestamp in a chunk when it's cut
  2. set the format based on the max timestamp in a chunk when it's cut

I think (1) is the more conservative approach and thus more appropriate. We may write an older format to a newer schema bucket, but we won't write a newer format to an older bucket. To be exact, the chunk is written once then added to the index buckets for different days. I'd rather us have some v13-schema indices reference v12 compatible chunks than vice versa.

Copy link
Collaborator Author

@kavirajk kavirajk Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. thanks @owen-d for the explanation.

In fact my initial commit used "per-stream" chunk format selection, but was using model.Now() instead of min/max timestamp. Makes more sense to use correct timestamp here.

Also agree with min timestamp.

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
@kavirajk kavirajk merged commit bbfb13c into main Aug 24, 2023
4 checks passed
@kavirajk kavirajk deleted the kavirajk/schema-v13 branch August 24, 2023 07:52
@grafanabot
Copy link
Collaborator

Hello @kavirajk!
Backport pull requests need to be either:

  • Pull requests which address bugs,
  • Urgent fixes which need product approval, in order to get merged,
  • Docs changes.

Please, if the current pull request addresses a bug fix, label it with the type/bug label.
If it already has the product approval, please add the product-approved label. For docs changes, please add the type/docs label.
If the pull request modifies CI behaviour, please add the type/ci label.
If none of the above applies, please consider removing the backport label and target the next major/minor release.
Thanks!

grafanabot pushed a commit that referenced this pull request Aug 24, 2023
We pin all three `Chunk`, `HeadBlock` and `TSDB` Version to `schema`
version in period config.

This is the following mapping (after being discussed with @owen-d and
@sandeepsukhani )

`v12` (current existing schema) - ChunkFormatV3 (UnorderedHeadBlock) +
TSDBv2
`v13` (introducing new schema) - ChunkFormatV4
(UnorderedWithNonIndexedLabelsHeadBlockFmt) + TSDBv3

Note the new schema `v13` supports the latest chunk and index format.

**NOTES for Reviewer**

1. General approach is we removed the idea of `index.LiveFormat`,
`chunkenc.DefaultChunkFormat` and `chunkenc.DefaultHeadBlockFmt` and
made following two changes. These variables were used before to tie
chunk and tsdb formats specific to Loki versions. This PR remove that
coupling and pin these formats to `schema` version instead.

1. These variables were replaced with explicit chunk and index formats
within those packages (and it's tests)
2. If these variables were used outside it's own packages say by
ingester, compactor, etc. Then we extract correct chunk and index
versions from the `schema` config.

2. Add two methods to `periodConfig`. (1) `ChunkFormat()` returning
chunk and head format tied to schema (2) `TSDBFormat()` returning tsdb
format tied to schema.

2. Other ideas I thought of doing but didn't end up doing is make
`ChunkFormat` and `IndexFormat` as separate type (rather than `byte` and
`int` currently. Similar to `HeadBlockFmt` type). But didnt' do it
eventually to keep the PR small and don't want to complicate with lots
of changes.

4. Moved couple of test cases from `chunkenc` to `config` package,
because the test case was actually testing methods on `schemaconfig` and
it was creating cycling dependencies.

---------

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
(cherry picked from commit bbfb13c)
kavirajk added a commit that referenced this pull request Aug 24, 2023
…10333)

Backport bbfb13c from #10213

---

We pin all three `Chunk`, `HeadBlock` and `TSDB` Version to `schema`
version in period config.


This is the following mapping (after being discussed with @owen-d and
@sandeepsukhani )

`v12` (current existing schema) - ChunkFormatV3 (UnorderedHeadBlock) +
TSDBv2
`v13` (introducing new schema) - ChunkFormatV4
(UnorderedWithNonIndexedLabelsHeadBlockFmt) + TSDBv3

Note the new schema `v13` supports the latest chunk and index format.

**NOTES for Reviewer**

1. General approach is we removed the idea of `index.LiveFormat`,
`chunkenc.DefaultChunkFormat` and `chunkenc.DefaultHeadBlockFmt` and
made following two changes. These variables were used before to tie
chunk and tsdb formats specific to Loki versions. This PR remove that
coupling and pin these formats to `schema` version instead.

1. These variables were replaced with explicit chunk and index formats
within those packages (and it's tests)
2. If these variables were used outside it's own packages say by
ingester, compactor, etc. Then we extract correct chunk and index
versions from the `schema` config.

2. Add two methods to `periodConfig`. (1) `ChunkFormat()` returning
chunk and head format tied to schema (2) `TSDBFormat()` returning tsdb
format tied to schema.

2. Other ideas I thought of doing but didn't end up doing is make
`ChunkFormat` and `IndexFormat` as separate type (rather than `byte` and
`int` currently. Similar to `HeadBlockFmt` type). But didnt' do it
eventually to keep the PR small and don't want to complicate with lots
of changes.

4. Moved couple of test cases from `chunkenc` to `config` package,
because the test case was actually testing methods on `schemaconfig` and
it was creating cycling dependencies.

Co-authored-by: Kaviraj Kanagaraj <kavirajkanagaraj@gmail.com>
grafanabot added a commit that referenced this pull request Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants