Skip to content
This repository has been archived by the owner. It is now read-only.

Implement deletions #82

Merged
merged 26 commits into from May 27, 2017

Conversation

@gouthamve
Copy link
Member

gouthamve commented May 16, 2017

Todo:

  • Test the deletion
  • Put deletions on db (currently only on head and persisted blocks)
  • Handle deletions beyond current time.
  • Add locking to contend with compaction
  • Use WAL for headBlock
  • Revisit tombstone file format and add checksums
  • Optimise the tombstone lookup by using an Iterator instead of a map.
  • Add meta information: numTombstones.
  • Revisit all changed and new interfaces again.
    • Rename trange to interval.
    • Make []trange a new type named intervals.

Idea:
The tombstones live inside the index, whenever we look-up the series chunks, we insert the deleted ranges into the chunk.
When this chunk returns an iterator, the iterator returns values purely outside the deleted time-range.

We load the tombstones from a file, we when delete, we replace the file with the new tombstone file and have to execute reloadBlocks() to update an index's tombstones.

@fabxc


This change is Reviewable

gouthamve added 3 commits May 14, 2017
Very much a WIP

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Copy link
Member

fabxc left a comment

Looks great overall! I added some comments.

In terms of managing complexity, it seems like a good idea to not drag tombstones into the data we have now but rather add same as separate filtering layer on querying.

My hunch is that it makes most sense to inject somewhere here in blockSeriesSet and chunkSeries.
Two benefits: in blockSeriesSset we can check whether tombstones completely cover certain chunks and remove them from the set to begin with. If no chunks remain, have the SeriesSet hide the entire series.
chunkSeries rather than the ChunkMeta can wrap the the chunk iterator into a deletionIterator, which avoids leaking the deletion feature further than necessary.

head.go Outdated
h.mtx.RUnlock()

h.mtx.Lock() // We are modifying the tombstones here.
defer h.mtx.Unlock()

This comment has been minimized.

Copy link
@fabxc

fabxc May 16, 2017

Member

Switching from read to write lock is always a bit dangerous as stuff can happen in between. If we always have to the acquire the wlock anyway, it's typically not worth the headache. We can just acquire the write lock from the beginning.

head.go Outdated
continue
}

h.tombstones[ref] = addNewInterval(rs, trange{mint, maxt})

This comment has been minimized.

Copy link
@fabxc

fabxc May 16, 2017

Member

The way addNewInterval works, you should be able to simplify the above to

h.tombstones[ref] = addNewInterval(h.tombstones[ref], trange{mint, maxt})
head.go Outdated
return writeTombstoneFile(h.dir, newMapTombstoneReader(h.tombstones))
}

// Querier implements Queryable and headBlock

This comment has been minimized.

Copy link
@fabxc

fabxc May 16, 2017

Member

All comments in their own line should end with a .

head.go Outdated
@@ -524,6 +595,9 @@ func (h *headIndexReader) Series(ref uint32) (labels.Labels, []*ChunkMeta, error
MinTime: c.minTime,
MaxTime: c.maxTime,
Ref: (uint64(ref) << 32) | uint64(i),

deleted: deleted,
dranges: dranges,

This comment has been minimized.

Copy link
@fabxc

fabxc May 16, 2017

Member

I see how this is getting tricky. Maybe it makes sense to not consider tombstones at this level. It's the low-level interface to access data we have. Given that the data is still there, just extended by tombstones, maybe it should remain queryable.

At higher level abstractions we can then combine that info with our known tombstones to provide the end result. We could just extend IndexReader by a Tombstone(ref) method. For the disk index this now means it's multiple files. Or, maybe cleaner, make tombstones its own additional concept on the same level as chunks and index.

index.go Outdated
// TODO(gouthamve): Figure out where this function goes, index or block.
if err := writeTombstoneFile(dir, emptyTombstoneReader); err != nil {
return nil, err
}

This comment has been minimized.

Copy link
@fabxc

fabxc May 16, 2017

Member

Ah, I see you were thinking about the same two options I mentioned above.

gouthamve added 6 commits May 16, 2017
Also add Seek() to TombstoneReader

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
We need to recalculate the sorted ref list everytime we make a
Tombstones() call. This avoids that.

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
@gouthamve

This comment has been minimized.

Copy link
Member Author

gouthamve commented May 19, 2017

@fabxc So the only thing pending other than (a ton of) tests is the locking to ensure compaction doesn't mess with deletes. Instead of having a lock on each Block, I plan to wait for existing compactions to complete and stop compactions until Delete is done.

This functionality is anyways required for backups and deletes need not be very fast, so I think its the best way to go. WDYT?

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented May 19, 2017

gouthamve added 4 commits May 20, 2017
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
@gouthamve

This comment has been minimized.

Copy link
Member Author

gouthamve commented May 22, 2017

@fabxc So the changes since the last review:

  1. Made tombstones their own thing like index and chunks.
  2. Serialisation of Deletes and Compaction is done via a Mutex.
  3. Add tests for the tombstone iterators and headblock deletions.

The pending changes are listed in first description. It'd be great if you gave it another review while I work on them. Also, I would like to rebase this on top of #83 once that is merged.

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
@gouthamve gouthamve force-pushed the gouthamve:deletes-1 branch from 0fb9d3c to 8434019 May 22, 2017
gouthamve added 2 commits May 22, 2017
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Misc. changes for code cleanliness.

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Copy link
Member

fabxc left a comment

Little checkpoint. I'm not through yet :)

block.go Outdated
maxtime := chunks[len(chunks)-1].MaxTime
if maxtime > maxt {
maxtime = maxt
}

This comment has been minimized.

Copy link
@fabxc

fabxc May 22, 2017

Member

What's the argument for clamping the max time but not the min time?
Currently, the clamping is probably not necessary as we cannot modify the block – but probably safer to do it.

This comment has been minimized.

Copy link
@gouthamve

gouthamve May 22, 2017

Author Member

Yes, we cannot append before. But yeah, will do it.

it chunks.Iterator

intervals intervals
}

This comment has been minimized.

Copy link
@fabxc

fabxc May 22, 2017

Member

Is there a reason for applying the tombstones to each chunk iterator rather than wrapping a full series iterator? Dragging it this far down makes it more invasive.

This comment has been minimized.

Copy link
@fabxc

fabxc May 22, 2017

Member

I see, we use this to re-encode chunks in compaction.

compact.go Outdated
}
if err := tf.Close(); err != nil {
return errors.Wrap(err, "close tombstones file")
}

This comment has been minimized.

Copy link
@fabxc

fabxc May 22, 2017

Member

Does this imply every block always has a tombstone file around?

This comment has been minimized.

Copy link
@gouthamve

gouthamve May 22, 2017

Author Member

Yes, an empty tombstone file atleast. I can handle no tombstone file to be an empty one, if you want.

// Delete implements deletion of metrics.
func (db *DB) Delete(mint, maxt int64, ms ...labels.Matcher) error {
db.cmtx.Lock()
defer db.cmtx.Unlock()

This comment has been minimized.

Copy link
@fabxc

fabxc May 22, 2017

Member

As you said, another mutex is not exactly pretty. An alternative may be adding a function channel for coordinating block tasks.

Let's say there was db.actorc of type chan func(). The DB's backgronud routine would consume these functions in a for loop and execute them. That guarantees synchronization without the mutex. Additionally, if we decide to go the context route in #84 at some point, one can easily time out waiting for ones turns without having Delete block forever until it finally gets the mutex.

func (db *DB) delete(mint, maxt int64, ms ...labels.Matcher) error {
    // actual delete logic.
}

func (db *DB) Delete(ctx context.Context, mint, maxt int64, ms ...labels.Matcher) (err error_ {
    select {
    case <-ctx.Done():
        err = ctx.Err()
    case db.actorc <- func() {
        err = db.delete(mint, maxt, ms...)
    }
    return err
}

This comment has been minimized.

Copy link
@gouthamve

gouthamve May 23, 2017

Author Member

I will make this change in later PR and would prefer if this PR went in with the current implementation.

db.go Outdated
db.cmtx.Lock()
defer db.cmtx.Unlock()

db.headmtx.RLock()

This comment has been minimized.

Copy link
@fabxc

fabxc May 22, 2017

Member

I think that's the wrong mutex for db.blocks.

This comment has been minimized.

Copy link
@gouthamve

gouthamve May 23, 2017

Author Member

You sure: https://github.com/prometheus/tsdb/blob/master/querier.go#L62-L64 ? If yes, I will change there also.

But it also makes sense to use headmtx as blocks will change only when headblocks change?

This comment has been minimized.

Copy link
@fabxc

fabxc May 23, 2017

Member

A compaction can also only change non-head blocks.
In the querier we hold the entire DB lock regardless, which we don't do here, right? Not 100% sure without looking properly again why we lock headmtx for the brief retrieval of blocksForRange – probably there was a reason somewhere

db.go Outdated
f := func() error {
return b.Delete(mint, maxt, ms...)
}
g.Go(f)

This comment has been minimized.

Copy link
@fabxc

fabxc May 22, 2017

Member

No need to instantiate f. Just pass the function directly to g.Go. It's a matter of test in the end of course.

This comment has been minimized.

Copy link
@gouthamve

gouthamve May 22, 2017

Author Member

g.Go needs a function that accepts no parameters. It needs a func() error. Any other way to do it?

This comment has been minimized.

Copy link
@fabxc

fabxc May 22, 2017

Member

Yea, I just meant you can just do

g.Go(func() error { return b.Delete(mint, maxt, ms...) })
db.go Outdated
}
if amin >= bmin && amin <= bmax {
// Checks Overlap: http://stackoverflow.com/questions/3269434/
if amin <= bmax && bmin <= amax {

This comment has been minimized.

Copy link
@fabxc

fabxc May 22, 2017

Member

Neat!

This comment has been minimized.

Copy link
@fabxc

fabxc May 22, 2017

Member

I guess we can just return amin <= bmax && bmin <= amax then.

require.NoError(t, pr.Err())
require.NoError(t, tr.Err())
}
return

This comment has been minimized.

Copy link
@fabxc

fabxc May 22, 2017

Member

Is this function different from testFunc?

This comment has been minimized.

Copy link
@gouthamve

gouthamve May 23, 2017

Author Member

Yes, in testFunc we are testing against the map, but here we are testing against a static range.

gouthamve added 4 commits May 23, 2017
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
@gouthamve gouthamve changed the title [WIP]: Deletions Implement deletions May 23, 2017
@@ -0,0 +1,55 @@
# Tombstones Disk Format

The following describes the format of a tombstones file, which is the directory of a block.

This comment has been minimized.

Copy link
@fabxc

fabxc May 23, 2017

Member

"placed in the top-level directory of a block"?

│ ├──────────────────────────────────────────────┤ │
│ │ Ref(stones start)<8b> │ │
│ └──────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘

This comment has been minimized.

Copy link
@fabxc

fabxc May 23, 2017

Member

What is advantage of having a level of indirection from stone to ranges?
Right now (and probably for ever), we are scanning all stones and reading them into memory. In the general case, there's probably just a single deletion range per series (or a few). The total size of ranges for a stone are still tiny, so separating them away from the stone itself just adds complexity to the format and related code.

It seems fine here to just make the file a sequence of <series_ref>, <#ranges>, <range1>, <range2>, ...

The reason why we chose indirection elsewhere is usually to apply deduplication (symbol table in the index) or to co-locate frequently accessed, small items better by moving larger contents belonging to them elsewhere (series in the index vs the chunks). Here neither is really a concern.

Probably fine to just CRC all entries once at the end. It will generally be in the single digit MB range even when deleting 200k series or so. At that scale, it will likely be applied via compaction directly because we can save a lot of space by removing data.
If a tombstone file is small enough to leave it around, a CRC32 is definitely sufficient.

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
gouthamve added 2 commits May 24, 2017
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
* Make sure no reads happen on the block when delete is in progress.
* Fix bugs in compaction.

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
block.go Outdated
@@ -88,6 +95,9 @@ type BlockMeta struct {
Compaction struct {
Generation int `json:"generation"`
} `json:"compaction"`

// The number of tombstones.
NumTombstones int64 `json:"numTombstones"`

This comment has been minimized.

Copy link
@fabxc

fabxc May 26, 2017

Member

This should go into Stats, no?

block.go Outdated
}
if mint < chunks[0].MinTime {
mint = chunks[0].MinTime
}

This comment has been minimized.

Copy link
@fabxc

fabxc May 26, 2017

Member

Can we add a small helper function a, b = clampInterval(a, b, min, max) and us this here?

x := d.b[0]
d.b = d.b[1:]
return x

This comment has been minimized.

Copy link
@fabxc

fabxc May 26, 2017

Member

Superfluous newline.

head.go Outdated
a.mtx.RUnlock()
return err
err.Add(a.wal.LogSeries(a.newLabels))
err.Add(a.wal.LogSamples(a.samples))

This comment has been minimized.

Copy link
@fabxc

fabxc May 26, 2017

Member

That's not sane in general. If writing series failed and we write samples referencing them afterwards, reading back the WAL will cause issues.


// TombstoneReader is the iterator over tombstones.
type TombstoneReader interface {
At(ref uint32) intervals

This comment has been minimized.

Copy link
@fabxc

fabxc May 26, 2017

Member

Maybe simply Get so it doesn't get confused with our iterator interface?

@@ -0,0 +1,198 @@
package tsdb

This comment has been minimized.

Copy link
@fabxc

fabxc May 26, 2017

Member

License header missing.

block.go Outdated
pb := &persistedBlock{
dir: dir,
meta: *meta,
chunkr: cr,
indexr: ir,

This comment has been minimized.

Copy link
@fabxc

fabxc May 26, 2017

Member

No need for newline.

block.go Outdated
@@ -150,6 +160,9 @@ type persistedBlock struct {

chunkr *chunkReader
indexr *indexReader

// For tombstones.
tombstones tombstoneReader

This comment has been minimized.

Copy link
@fabxc

fabxc May 26, 2017

Member

Obvious comment. I think we can drop that.

@@ -0,0 +1,110 @@
package tsdb

This comment has been minimized.

Copy link
@fabxc

fabxc May 26, 2017

Member

Missing license header

wal.go Outdated
}

// LogDeletes write a batch of new deletes to the log.
func (w *SegmentWAL) LogDeletes(tr tombstoneReader) error {

This comment has been minimized.

Copy link
@fabxc

fabxc May 26, 2017

Member

Same. Exported methods must not expose unexported types.

* Expose Stone as it is used in an exported method.
* Move from tombstoneReader to []Stone for the same reason as above.
* Make WAL reading a little cleaner.

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
@gouthamve

This comment has been minimized.

Copy link
Member Author

gouthamve commented May 26, 2017

@fabxc I think I covered all your feedback, could you give it another review?

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
@gouthamve gouthamve merged commit 3a5ae6b into prometheus:master May 27, 2017
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@gouthamve

This comment has been minimized.

Copy link
Member Author

gouthamve commented May 27, 2017

@fabxc Per our discussion, I have manually tested this and merged it. Will open a PR for the deletes endpoint in Prometheus soon.

@gouthamve gouthamve referenced this pull request Jun 8, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
2 participants
You can’t perform that action at this time.