Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix index inconsistency after deletes #8576

Merged
merged 2 commits into from
Jul 7, 2017
Merged

Fix index inconsistency after deletes #8576

merged 2 commits into from
Jul 7, 2017

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented Jul 7, 2017

Required for all non-trivial PRs
  • Rebased/mergable
  • Tests pass
  • CHANGELOG.md updated
  • Sign CLA (if not already signed)

This fixes an issue where the inmem index would be inconsistent with the data stored on disk. It also fixes a bug where deletes do not write tombstones for data that should be deleted.

The inconsistency can occur under write load to a measurement and concurrently dropping the measurement. There is a race where the incoming writes create index entries, but the writes have not hit the cache/wal yet. The delete completes and removes them series from the index and then the write completes and stores the data on disk. The end result is that the data is on disk, but the index is inconsistent and is not aware that it exists within the shard. Restarting the server causes the index to be reloaded and the data appears again.

There wasn't a good way to fix this without significantly reworking how the index and engine work together or introducing a lot of lock contention mainly to handle this cases of deletes. I went with an approach that refreshes the index from the newly compacted files as well as what is remaining in the cache when the files are installed for use. This resolves the issue, but there can be a brief period of time between when the the delete removes the series from the index and before the index is refreshed where series can appear missing.

Copy link
Contributor

@e-dard e-dard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the redundant return; everything else i just a nit or suggestion.

}

// onFileStoreReplace is callback handler invoked when the FileStore
// has replace one set of TSM files with a new set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit s/replace/replaced

readers = append(readers, ch)

go func(c chan seriesKey, r TSMFile) {
n := r.KeyCount()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could just put i < r.KeyCount()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KeyCount ends up taking RLocks on the reader and I didn't want to lock the readers with every loop iteration.

e.Cache.ApplyEntryFn(func(key string, entry *entry) error {
fieldType, err := entry.values.InfluxQLType()
if err != nil {
// ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we at least log any errors here, or are we expecting them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be any errors here. I can log them though.

return nil
})

return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return is redundant.

@@ -895,7 +895,7 @@ func (d *indirectIndex) OverlapsTimeRange(min, max int64) bool {

// OverlapsKeyRange returns true if the min and max keys of the file overlap the arguments min and max.
func (d *indirectIndex) OverlapsKeyRange(min, max string) bool {
return d.minKey <= max && d.maxKey >= max
return d.minKey <= max && d.maxKey >= min
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it could have been the source of other bugs unrelated to the dropping/missing data issue. Do we have any other open issues that could be hitting this, and how do you think they would manifest themselves?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only called via DeleteRange and would appear that a delete didn't actually delete everything.

The in-memory index can get out of sync when deletes and writes
to the same measurement are running concurrently.  The index is
updated independently from data on disk and it's possible for the
index to unassign a shard when data still exists on disk.  What happens
is that there are TSM files on disk, but the index does not know that
the series that exist in those files still are in the shard.  Restarting
the server reloads the index and the data is visible again.  From and
end user perspective, this can look like more data is deleted than should
have been or that deleted data re-appears after a restart or writes to the
shard occur again.

There isn't an easy way to resolve this since the index and storage
are not transactional resources and we cannot atomically commit or
rollback changes to both at once.

As a workaround, after new TSM files are installed, we refresh the
index with series keys that exist in the new tsm files as well as
any lingering data still in the cache.  There is a small window of time
when the index may be missing series, but it will re-appear after the refresh
completes.
The min key was not used in OverlapsKeyRange which caused it to return
false when it should be true.  This causes a bug where deletes would not
write tombstones for files that actually contained the data it was supposed
to delete.
@jwilder
Copy link
Contributor Author

jwilder commented Jul 7, 2017

@e-dard All fixed.

@jwilder jwilder merged commit dba3ce1 into master Jul 7, 2017
@jwilder jwilder deleted the jw-delete-index branch July 7, 2017 20:36
@jwilder jwilder removed the review label Jul 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants