-
Notifications
You must be signed in to change notification settings - Fork 180
Blocks can contain chunks that go past MaxTime #347
Comments
In order to check for blocks with "outsider" chunks, I use this script: package main
import (
"fmt"
"log"
"os"
"time"
"github.com/prometheus/tsdb"
"github.com/prometheus/tsdb/chunks"
"github.com/prometheus/tsdb/index"
"github.com/prometheus/tsdb/labels"
)
func ms(t int64) time.Time {
return time.Unix(0, t*1e6)
}
func check(dir string) {
block, err := tsdb.OpenBlock(dir, nil)
if err != nil {
log.Fatal(err)
}
defer block.Close()
meta := block.Meta()
fmt.Printf("block=%s, MinTime=%s, MaxTime=%s, compaction=%d\n", block, ms(meta.MinTime), ms(meta.MaxTime), meta.Compaction.Level)
r, err := block.Index()
if err != nil {
log.Fatal(err)
}
defer r.Close()
p, err := r.Postings(index.AllPostingsKey())
if err != nil {
log.Fatal(err)
}
var (
lset labels.Labels
chks []chunks.Meta
)
for p.Next() {
if err := r.Series(p.At(), &lset, &chks); err != nil {
log.Fatal(err)
}
for _, c := range chks {
if c.MinTime < meta.MinTime || meta.MaxTime < c.MaxTime {
fmt.Printf("posting=%d, MinTime=%s, maxTime=%s\n", p.At(), ms(c.MinTime), ms(c.MaxTime))
}
}
}
}
func main() {
for _, dir := range os.Args[1:] {
check(dir)
}
} |
This is nice finding! Can you propose BenoitKnecht@23b26e7 in a PR? That would maybe accelerate things up. Even if it's not ready to go, we can iterate over it. |
@fabxc I think this issue is actually a real root cause for this: thanos-io/thanos#183 |
Will propose PR to fix this here. |
Reduce `block.MaxTime` by 1 millisecond so that `blocks[1].MinTime == blocks[0].MaxTime + 1`. This prevents chunks that start exactly at the block end (i.e. `chunk.MinTime == block.MaxTime`) from being included in that block. See prometheus-junkyard#347 for a detailed explanation.
Reduce `block.MaxTime` by 1 millisecond so that `blocks[1].MinTime == blocks[0].MaxTime + 1`. This prevents chunks that start exactly at the block end (i.e. `chunk.MinTime == block.MaxTime`) from being included in that block. See prometheus-junkyard#347 for a detailed explanation. Signed-off-by: Benoît Knecht <benoit.knecht@fsfe.org
@Bplotka Sorry, I didn't see your message before submitting my PR. Feel free to comment on mine or open a separate one with your fix, if you think it's appropriate. |
No no, it super nice you added yours, because I tried totally different approach and not sure which one is better.
But this is not ideal either, because technically when you are asking for In the same time I am not sure if approach from your PR works. For example based on your explanation here:
This is very true,
|
OK I missed the fact that actually chunks are cut in the way that it makes sure that they don't cross boundaries. So we could potentially do as in proposed PR: #348 Commented there why I think it might be risky. Maybe we should add proper handling on compaction instead? EDIT: just not allowing this chunks into first block seems ok, since block |
TSDB issue: prometheus-junkyard/tsdb#347 How we handle it? - segregate this issue in special stat entry in verifier. - auto-fix broken plan block before thanos compaction. - adding repair job to run offline batch repair for indivdual blocks. NOTE: At this point we have no power of fixing the bug when someone uses local compaction ): Fixes #354 Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
TSDB issue: prometheus-junkyard/tsdb#347 How we handle it? - segregate this issue in special stat entry in verifier. - auto-fix broken plan block before thanos compaction. - adding repair job to run offline batch repair for indivdual blocks. NOTE: At this point we have no power of fixing the bug when someone uses local compaction ): Fixes #354 Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
TSDB issue: prometheus-junkyard/tsdb#347 How we handle it? - segregate this issue in special stat entry in verifier. - auto-fix broken plan block before thanos compaction. - adding repair job to run offline batch repair for indivdual blocks. NOTE: At this point we have no power of fixing the bug when someone uses local compaction ): Fixes #354 Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
@BenoitKnecht , @krasi-georgiev was asking really good question, why blocks are overlapping actually. We need to know the answer why, and if changing blocks to actually be milisecond away breakes anything. @krasi-georgiev is right is our attempts to fix it are only workarounds... ): Any ideas? @fabxc do you remember why blocks vs chunks range inconsistency? and is it safe to make blocks like chunks (non-overlapping)? |
TSDB issue: prometheus-junkyard/tsdb#347 How we handle it? - segregate this issue in special stat entry in verifier. - auto-fix broken plan block before thanos compaction. - adding repair job to run offline batch repair for indivdual blocks. NOTE: At this point we have no power of fixing the bug when someone uses local compaction ): Fixes #354 Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
TSDB issue: prometheus-junkyard/tsdb#347 How we handle it? - segregate this issue in special stat entry in verifier. - auto-fix broken plan block before thanos compaction. - adding repair job to run offline batch repair for indivdual blocks. NOTE: At this point we have no power of fixing the bug when someone uses local compaction ): Fixes #354 Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
TSDB issue: prometheus-junkyard/tsdb#347 How we handle it? - segregate this issue in special stat entry in verifier. - auto-fix broken plan block before thanos compaction. - adding repair job to run offline batch repair for indivdual blocks. NOTE: At this point we have no power of fixing the bug when someone uses local compaction ): Fixes #354 Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
TSDB issue: prometheus-junkyard/tsdb#347 How we handle it? - segregate this issue in special stat entry in verifier. - auto-fix broken plan block before thanos compaction. - adding repair job to run offline batch repair for indivdual blocks. NOTE: At this point we have no power of fixing the bug when someone uses local compaction ): Fixes #354 Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
Yes, that's how I intended to fix this issue initially. It makes things much easier in tsdb, because all intervals would be treated the same way (see #348 (comment)) . But I was told that treating blocks as I'm happy to revert to the initial implementation though, if the consensus is that it's the better approach. |
I think we should merge your PR to actually fix duplicated chunk bug and aim for fixing inconsistency or figuring out why is that the case as next steps |
TSDB issue: prometheus-junkyard/tsdb#347 How we handle it? - segregate this issue in special stat entry in verifier. - auto-fix broken plan block before thanos compaction. - adding repair job to run offline batch repair for indivdual blocks. NOTE: At this point we have no power of fixing the bug when someone uses local compaction ): Fixes #354 Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
TSDB issue: prometheus-junkyard/tsdb#347 How we handle it? - segregate this issue in special stat entry in verifier. - auto-fix broken plan block before thanos compaction. - adding repair job to run offline batch repair for indivdual blocks. NOTE: At this point we have no power of fixing the bug when someone uses local compaction ): Fixes #354 Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
…e. (#375) TSDB issue: prometheus-junkyard/tsdb#347 How we handle it? - segregate this issue in special stat entry in verifier. - auto-fix broken plan block before thanos compaction. - adding repair job to run offline batch repair for indivdual blocks. NOTE: At this point we have no power of fixing the bug when someone uses local compaction ): Fixes #354 Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
Reduce `block.MaxTime` by 1 millisecond so that `blocks[1].MinTime == blocks[0].MaxTime + 1`. This prevents chunks that start exactly at the block end (i.e. `chunk.MinTime == block.MaxTime`) from being included in that block. See prometheus-junkyard#347 for a detailed explanation. Signed-off-by: Benoît Knecht <benoit.knecht@fsfe.org>
Due to the way blocks used to overlap by 1 millisecond (see prometheus-junkyard#347), when requesting a 2-hour interval starting at `blocks[1].MinTime`, the `Querier` would consider three blocks: `blocks[0]`, `blocks[1]` and `blocks[2]`, because `blocks[0].MaxTime` and `blocks[2].MinTime` were in that interval. However, if the blocks don't overlap, only two blocks should be returned: `blocks[1]` and `blocks[2]`. This test ensures that it's indeed the case. Signed-off-by: Benoît Knecht <benoit.knecht@fsfe.org>
Due to the way blocks used to overlap by 1 millisecond (see prometheus-junkyard#347), when requesting a 2-hour interval starting at `blocks[1].MinTime`, the `Querier` would consider three blocks: `blocks[0]`, `blocks[1]` and `blocks[2]`, because `blocks[0].MaxTime` and `blocks[2].MinTime` were in that interval. However, if the blocks don't overlap, only two blocks should be returned: `blocks[1]` and `blocks[2]`. This test ensures that it's indeed the case. Signed-off-by: Benoît Knecht <benoit.knecht@fsfe.org>
Due to the way blocks used to overlap by 1 millisecond (see prometheus-junkyard#347), when requesting a 2-hour interval starting at `blocks[1].MinTime`, the `Querier` would consider three blocks: `blocks[0]`, `blocks[1]` and `blocks[2]`, because `blocks[0].MaxTime` and `blocks[2].MinTime` were in that interval. However, if the blocks don't overlap, only two blocks should be returned: `blocks[1]` and `blocks[2]`. This test ensures that it's indeed the case. Signed-off-by: Benoît Knecht <benoit.knecht@fsfe.org>
I've been investigating thanos-io/thanos#354, and I think I've finally identified what's going on.
The issue came up because Thanos checks that all the chunks in a block are between
block.MinTime
andblock.MaxTime
, but in some circumstances, it detected chunks withblock.MaxTime < chunk.MaxTime
. Those chunks also happened to satisfyblock.MaxTime == chunk.MinTime
, which we'll see is no coincidence. After suspecting that the issue came from Thanos, I'm now pretty sure it's in tsdb.New chunks are create when appending data to head using append(), specifically this part:
where
t
is the timestamp of the data being added, ands.nextAt
can be equal toblock.MaxTime
. So if Prometheus happens to scrape a metric whose timestamp is equal toblock.MaxTime
, it will create a chunk withchunk.MinTime == block.MaxTime
.Now, when head needs to be persisted to disk in a new block, this function determines which chunks are part of the block, specifically:
with
So a chunk is included in the block if it overlaps it in any way, including the case where
chunk.MinTime == block.MaxTime
.It isn't ideal, because it means that most data in that last chunk is useless (it won't be read because it's outside the range
[block.MinTime, block.MaxTime]
), and the chunk is going to be part of the next block by the same criteria (it overlaps it as well), so we end up storing the same data twice.It seems to me that the whole issue stems from the fact that by the way
intervalOverlap()
is defined, blocks overlap by 1 millisecond. For instance, with 2-hour blocks, ifblocks[0].MinTime = 0
,blocks[0].MaxTime = 7200000
, then the next block will beblocks[1].MinTime = 7200000
,blocks[1].MaxTime = 14400000
. So a metric witht = 7200000
would be inblocks[0]
andblocks[1]
at the same time.A possible way to fix this would be to say that
block.MaxTime
is an exclusive upper bound (so a chunk would only be included ifchunk.MinTime < block.MaxTime && block.MinTime <= chunk.MaxTime
). The other option would be to make sure that blocks don't overlap by the current definition ofintervalOverlap()
, i.e. make sure thatblocks[1].MinTime = blocks[0].MaxTime + 1
.I've taken that second approach in this proof of concept.
To make reproducing the issue and testing the fix easier, I patched Prometheus to round timestamps down to the minute (so that they would fall on the block boundary): https://github.com/BenoitKnecht/prometheus/tree/fix-block-boundaries. You can use it with
with
prometheus.yml
containing:The text was updated successfully, but these errors were encountered: