Expired shard's directory no longer getting deleted since 0.8.4? #1078

jvshahid · 2014-10-31T22:30:40Z

Since we upgraded from 0.8.3 to 0.8.4 (and subsequently to 0.8.5), it looks like InfluxDB no longer deletes the directories containing expired shards' data. This is the log output from around the latest shard rollover which took place today at 01:00 CET:

Oct 30 00:32:30 influx influxdb[13368]: [10/30/14 00:32:30] [INFO] Checking for shards to drop
Oct 30 00:40:19 influx influxdb[13368]: [10/30/14 00:40:19] [INFO] Checking for shards to drop
Oct 30 00:50:19 influx influxdb[13368]: [10/30/14 00:50:19] [INFO] Checking for shards to drop
Oct 30 01:00:00 influx influxdb[13368]: [10/30/14 01:00:00] [INFO] No matching shards for write at time 1414627200000000u, creating...
Oct 30 01:00:00 influx influxdb[13368]: [10/30/14 01:00:00] [INFO] createShards for space default: start: Thu Oct 30 01:00:00 +0100 CET 2014. end: Fri Oct 31 01:00:00 +0100 CET 2014
Oct 30 01:00:01 influx influxdb[13368]: [10/30/14 01:00:01] [INFO] DATASTORE: opening or creating shard /var/influxdb/db/shard_db_v2/00066
Oct 30 01:00:01 influx influxdb[13368]: [10/30/14 01:00:01] [INFO] Adding shard to default: 66 - start: Thu Oct 30 01:00:00 +0100 CET 2014 (1414627200). end: Fri Oct 31 01:00:00 +0100 CET 2014 (1414713600). isLocal: true. servers: [1]
Oct 30 01:02:15 influx influxdb[13368]: [10/30/14 01:02:15] [INFO] Checking for shards to drop
Oct 30 01:10:19 influx influxdb[13368]: [10/30/14 01:10:19] [INFO] Checking for shards to drop
Oct 30 01:20:19 influx influxdb[13368]: [10/30/14 01:20:19] [INFO] Checking for shards to drop

At this time, older shard number 61 expired and was actually removed from the configuration (i.e. it's no longer visible in the admin GUI), but as opposed to the behavior we had in 0.8.3, the directory containing the RocksDB data didn't get removed.

Background of the bug: Prior to this patch we actually tried writing points that were older than the retention period of the shard. This caused race condition when it came to writing points to a shard that's being dropped, which will happen frequently if the user is loading old data (by accident). This is demonstrated in the test in this commit.This bug was previously addressed in #985. It turns the fix for #985 wasn't enough. A user reported in #1078 that some shards are left behind and not deleted. It turns out that while the shard is being dropped more write requests could come in and end up on line `cluster/shard.go:195` which will cause the datastore to create a shard on disk that isn't tracked anywhere in the metadata. This shard will live forever and never get deleted. This fix address this issue by not writing old points in, but there are still some edge cases with the current implementation, at least not as bad as current master.

jvshahid · 2014-11-03T19:03:59Z

@dgnorton @toddboom can you guys take a look

dgnorton · 2014-11-03T21:48:37Z

@jvshahid the fix lgtm. I think the for i := 0; i < len(series.Points); { loop in func (self *Coordinator) CommitSeriesData should be refactored in a separate issue to use a ranged loop and pull some things out of the loop.

dgnorton · 2014-11-03T22:24:44Z

coordinator/coordinator.go

@@ -507,20 +507,31 @@ func (self *Coordinator) CommitSeriesData(db string, serieses []*protocol.Series
 		// TODO: this isn't needed anymore
 		series.SortPointsTimeDescending()

+		sn := series.GetName()
+		// use regular for loop since we update the iteration index `i' as
+		// we batch points that have the same timestamp
 		for i := 0; i < len(series.Points); {


In a separate issue, I think we should refactor this loop to use a ranged loop and pull some things out of the loop.

jvshahid added bug labels Oct 30, 2014

jvshahid self-assigned this Oct 30, 2014

jvshahid added 2 - Working and removed 1 - Ready labels Oct 31, 2014

jvshahid added review and removed 2 - Working labels Oct 31, 2014

dgnorton reviewed Nov 3, 2014
View reviewed changes

jvshahid closed this in 1f5f5cb Nov 3, 2014

jvshahid removed the review label Nov 3, 2014

baryonix unassigned jvshahid Feb 24, 2015

toddboom deleted the fix-1078 branch May 5, 2015 19:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expired shard's directory no longer getting deleted since 0.8.4? #1078

Expired shard's directory no longer getting deleted since 0.8.4? #1078

jvshahid commented Oct 31, 2014

jvshahid commented Nov 3, 2014

dgnorton commented Nov 3, 2014

dgnorton Nov 3, 2014

Expired shard's directory no longer getting deleted since 0.8.4? #1078

Expired shard's directory no longer getting deleted since 0.8.4? #1078

Conversation

jvshahid commented Oct 31, 2014

jvshahid commented Nov 3, 2014

dgnorton commented Nov 3, 2014

dgnorton Nov 3, 2014

Choose a reason for hiding this comment