Skip to content
This repository has been archived by the owner on Aug 13, 2019. It is now read-only.

Keep series that are still in WAL in checkpoints #577

Merged
merged 1 commit into from
Apr 9, 2019

Conversation

brian-brazil
Copy link
Contributor

@brian-brazil brian-brazil commented Apr 4, 2019

If all the samples are deleted for a series,
we should still keep the series in the WAL as
anything else reading the WAL will still care
about it in order to understand the samples.

fixes: #21

Signed-off-by: Brian Brazil brian.brazil@robustperception.io

Copy link
Contributor

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one area where I think a comment would be useful.

Your commit message is already descriptive of the issue, but ideally we don't have to git blame or git log the file to see why the change was made.

head.go Show resolved Hide resolved
@krasi-georgiev
Copy link
Contributor

krasi-georgiev commented Apr 5, 2019

@brian-brazil that is probably the root cause for #21 .

some people report that it is appearing after a crash but others that it also appears without a crash so it seems they are hitting this bug.

@krasi-georgiev
Copy link
Contributor

I think instead of this we should just drop all irrelevant samples as per this PR
#568

@brian-brazil
Copy link
Contributor Author

I believe this will fix #21. There should be no irrelevant samples.

@krasi-georgiev
Copy link
Contributor

I think instead of this we should just drop all irrelevant samples as per this PR
#568

I think this should be the correct fix for this. Instead of keeping series we should just delete the irrelevant samples.

Copy link
Collaborator

@gouthamve gouthamve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@@ -75,6 +75,9 @@ type Head struct {
symbols map[string]struct{}
values map[string]stringset // label names to possible values

deletedMtx sync.Mutex
deleted map[uint64]int // Deleted series, and what WAL segment they must be kept until.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must be kept until what?

Suggested change
deleted map[uint64]int // Deleted series, and what WAL segment they must be kept until.
deleted map[uint64]int // Deleted series, and what WAL segment they must be kept until the last segment has samples referencing these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion doesn't make sense. This is a data structure, not an algorithm.

_, err := app.Add(labels.Labels{{"a", "b"}}, int64(i), 0)
testutil.Ok(t, err)
testutil.Ok(t, app.Commit())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets extract all this in a separate func like in my other PR
https://github.com/prometheus/tsdb/pull/467/files#diff-7ae027963734feb4c8722aa99f033363R165

createHead(t, genSeries(....))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd clash with the existing name, plus that function doesn't have the WAL handling I need.

head_test.go Outdated
// Confirm there's been a checkpoint.
cdir, _, err := LastCheckpoint(dir)
testutil.Ok(t, err)
// Read in checkpoint and WAL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Read in checkpoint and WAL
// Read in checkpoint and WAL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

head.go Outdated
@@ -570,6 +580,17 @@ func (h *Head) Truncate(mint int64) (err error) {
// that supersedes them.
level.Error(h.logger).Log("msg", "truncating segments failed", "err", err)
}

// The checkpoint is written and segments before it truncated, so we no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The checkpoint is written and segments before it truncated, so we no
// The checkpoint is written and segments before it is truncated, so we no

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -501,6 +501,57 @@ func TestDeleteUntilCurMax(t *testing.T) {
testutil.Ok(t, err)
testutil.Equals(t, []tsdbutil.Sample{sample{11, 1}}, ressmpls)
}

func TestDeletedSamplesAndSeriesStillInWALAfterCheckpoint(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe extend the test a bit like:
After the checkpoint add more samples for existing series. Read the wal from scratch and ensure that there are no samples with unknown series references.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what you're getting at. There are no existing series after we've read the checkpoint, as we've deleted them.

If all the samples are deleted for a series,
we should still keep the series in the WAL as
anything else reading the WAL will still care
about it in order to understand the samples.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
@krasi-georgiev
Copy link
Contributor

LGTM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unknown series references in WAL after crash
4 participants