Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce retention policies #2776

Merged
merged 10 commits into from
Jun 5, 2015
Merged

Enforce retention policies #2776

merged 10 commits into from
Jun 5, 2015

Conversation

otoolep
Copy link
Contributor

@otoolep otoolep commented Jun 4, 2015

This change adds a new service to the system for Retention Policy enforcement.

Enforcement is composed of two parts:

  • a goroutine, that only runs on the leader, which periodically walks all ShardGroups and deletes any ShardGroups that have expired. Note that with this change ShardGroups are not actually deleted. Instead they have a tombstone flag set. This is so it is explicit that a ShardGroup, and its associated shards, can be safely deleted.
  • a second goroutine that runs on every node, which checks each shard in the TSBD store against the list of shards which should actually exist. If the shard is part of a tombstoned ShardGroup it is deleted.

@otoolep otoolep force-pushed the enforce_retention_2_phase branch 6 times, most recently from 7e80aaa to 676b62c Compare June 5, 2015 00:00
@otoolep
Copy link
Contributor Author

otoolep commented Jun 5, 2015

@otoolep otoolep force-pushed the enforce_retention_2_phase branch from 676b62c to 99f68d4 Compare June 5, 2015 00:01
@otoolep otoolep force-pushed the enforce_retention_2_phase branch from 99f68d4 to c9c59d0 Compare June 5, 2015 00:02
@@ -70,6 +72,13 @@ func (s *Server) appendClusterService(c cluster.Config) {
s.Services = append(s.Services, srv)
}

func (s *Server) appendRetentionPolicyService(c retention.Config) {
srv := retention.NewService(c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with the current pattern, we should do:

if !c.Enabled {
     return
}

@corylanou
Copy link
Contributor

Looks good. I'm assuming we are adding test coverage in a future PR.

type ShardGroupInfo struct {
ID uint64
StartTime time.Time
EndTime time.Time
Shards []ShardInfo
Tombstone bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better make this a time.Time so we can later go and expunge shard groups that have been tombstoned long ago.

Also, the function names and comments refer to Deleteing a shard group. I think it would be clearer to name this DeletedAtpersonally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, using time.Time isn't a bad idea. Renaming to DeletedAt works for me.

@otoolep
Copy link
Contributor Author

otoolep commented Jun 5, 2015

Thanks for the valuable feedback @jwilder @corylanou

@otoolep otoolep force-pushed the enforce_retention_2_phase branch 2 times, most recently from 662b317 to 00c262d Compare June 5, 2015 04:44
@otoolep otoolep force-pushed the enforce_retention_2_phase branch from 00c262d to ae3e8c7 Compare June 5, 2015 04:47
func (s *Service) deleteShardGroups() {
defer s.wg.Done()

ticker := time.NewTicker(s.checkInterval)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still missing the defer ticker.Stop()

@jwilder
Copy link
Contributor

jwilder commented Jun 5, 2015

Two small items, but otherwise looks good. 👍

@otoolep
Copy link
Contributor Author

otoolep commented Jun 5, 2015

Thanks again @corylanou @jwilder -- final changes pushed up. Will perform more system test now before merging.

@otoolep
Copy link
Contributor Author

otoolep commented Jun 5, 2015

Since this has passed basic unit tests and has been reviewed, I'm going to merge now. There are a couple of other small changes that are needed post-testing, and it will be easier to review that way.

@toddboom
Copy link
Contributor

toddboom commented Jun 5, 2015

+1

otoolep added a commit that referenced this pull request Jun 5, 2015
@otoolep otoolep merged commit ca9f231 into master Jun 5, 2015
@otoolep otoolep deleted the enforce_retention_2_phase branch June 5, 2015 18:15
e-dard added a commit that referenced this pull request Oct 26, 2017
Fixes #8819.

Previously, the process of dropping expired shards according to the
retention policy duration, was managed by two independent goroutines in
the retention policy service. This behaviour was introduced in #2776,
at a time when there were both data and meta nodes in the OSS codebase.
The idea was that only the leader meta node would run the meta data
deletions in the first goroutine, and all other nodes would run the
local deletions in the second goroutine.

InfluxDB no longer operates in that way and so we ended up with two
independent goroutines that were carrying out an action that was really
dependent on each other.

If the second goroutine runs before the first then it may not see the
meta data changes indicating shards should be deleted and it won't
delete any shards locally. Shortly after this the first goroutine will
run and remove the meta data for the shard groups.

This results in a situation where it looks like the shards have gone,
but in fact they remain on disk (and importantly, their series within
the index) until the next time the second goroutine runs. By default
that's 30 minutes.

In the case where the shards to be removed would have removed the last
occurences of some series, then it's possible that if the database was already at its
maximum series limit (or tag limit for that matter), no further new series
can be inserted.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants