Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expired shard's directory no longer getting deleted since 0.8.4? #1078

Closed
wants to merge 1 commit into from

Conversation

jvshahid
Copy link
Contributor

Since we upgraded from 0.8.3 to 0.8.4 (and subsequently to 0.8.5), it looks like InfluxDB no longer deletes the directories containing expired shards' data. This is the log output from around the latest shard rollover which took place today at 01:00 CET:

Oct 30 00:32:30 influx influxdb[13368]: [10/30/14 00:32:30] [INFO] Checking for shards to drop
Oct 30 00:40:19 influx influxdb[13368]: [10/30/14 00:40:19] [INFO] Checking for shards to drop
Oct 30 00:50:19 influx influxdb[13368]: [10/30/14 00:50:19] [INFO] Checking for shards to drop
Oct 30 01:00:00 influx influxdb[13368]: [10/30/14 01:00:00] [INFO] No matching shards for write at time 1414627200000000u, creating...
Oct 30 01:00:00 influx influxdb[13368]: [10/30/14 01:00:00] [INFO] createShards for space default: start: Thu Oct 30 01:00:00 +0100 CET 2014. end: Fri Oct 31 01:00:00 +0100 CET 2014
Oct 30 01:00:01 influx influxdb[13368]: [10/30/14 01:00:01] [INFO] DATASTORE: opening or creating shard /var/influxdb/db/shard_db_v2/00066
Oct 30 01:00:01 influx influxdb[13368]: [10/30/14 01:00:01] [INFO] Adding shard to default: 66 - start: Thu Oct 30 01:00:00 +0100 CET 2014 (1414627200). end: Fri Oct 31 01:00:00 +0100 CET 2014 (1414713600). isLocal: true. servers: [1]
Oct 30 01:02:15 influx influxdb[13368]: [10/30/14 01:02:15] [INFO] Checking for shards to drop
Oct 30 01:10:19 influx influxdb[13368]: [10/30/14 01:10:19] [INFO] Checking for shards to drop
Oct 30 01:20:19 influx influxdb[13368]: [10/30/14 01:20:19] [INFO] Checking for shards to drop

At this time, older shard number 61 expired and was actually removed from the configuration (i.e. it's no longer visible in the admin GUI), but as opposed to the behavior we had in 0.8.3, the directory containing the RocksDB data didn't get removed.

@jvshahid jvshahid self-assigned this Oct 30, 2014
Background of the bug: Prior to this patch we actually tried writing
points that were older than the retention period of the shard. This
caused race condition when it came to writing points to a shard that's
being dropped, which will happen frequently if the user is loading old
data (by accident). This is demonstrated in the test in this commit.This
bug was previously addressed in #985. It turns the fix for #985 wasn't
enough. A user reported in #1078 that some shards are left behind and
not deleted.

It turns out that while the shard is being dropped more write
requests could come in and end up on line `cluster/shard.go:195` which
will cause the datastore to create a shard on disk that isn't tracked
anywhere in the metadata. This shard will live forever and never get
deleted. This fix address this issue by not writing old points in, but
there are still some edge cases with the current implementation, at
least not as bad as current master.
@jvshahid
Copy link
Contributor

jvshahid commented Nov 3, 2014

@dgnorton @toddboom can you guys take a look

@dgnorton
Copy link
Contributor

dgnorton commented Nov 3, 2014

@jvshahid the fix lgtm. I think the for i := 0; i < len(series.Points); { loop in func (self *Coordinator) CommitSeriesData should be refactored in a separate issue to use a ranged loop and pull some things out of the loop.

@@ -507,20 +507,31 @@ func (self *Coordinator) CommitSeriesData(db string, serieses []*protocol.Series
// TODO: this isn't needed anymore
series.SortPointsTimeDescending()

sn := series.GetName()
// use regular for loop since we update the iteration index `i' as
// we batch points that have the same timestamp
for i := 0; i < len(series.Points); {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a separate issue, I think we should refactor this loop to use a ranged loop and pull some things out of the loop.

@jvshahid jvshahid closed this in 1f5f5cb Nov 3, 2014
@jvshahid jvshahid removed the review label Nov 3, 2014
@toddboom toddboom deleted the fix-1078 branch May 5, 2015 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants