Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configuration parameter to expose rate limit for TSM compaction. #9939

Merged
merged 1 commit into from
Aug 7, 2018

Conversation

zachgoldstein
Copy link
Contributor

@zachgoldstein zachgoldstein commented Jun 4, 2018

Closes: 9938

Required for all non-trivial PRs
  • Sign CLA (if not already signed)
Required only if applicable

You can erase any checkboxes below this note if they are not applicable to your Pull Request.

  • Config changes: update sample config (etc/config.sample.toml), server NewDemoConfig method, and Diagnostics methods reporting config settings, if necessary
  • InfluxData Documentation: issue filed or pull request submitted <link to issue or pull request>

@ghost ghost added the proposed label Jun 4, 2018
Copy link
Member

@jacobmarble jacobmarble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zachgoldstein the compaction throughput feature has been helpful for some InfluxDB and Enterprise users that I'm aware of, so thanks for exposing it better in this PR. You followed existing patterns, which made this a very straightforward review.

Because it took us so long to get around to this review, the PR won't merge, so I'm sorry to ask you to rebase against master, in addition to my code comments. Please let me know if you can't get to that right away and I'll find some time later this week to do it for you.

If you are able to rebase and follow-up to my review, then I'll build and test locally before merging.

@@ -97,6 +97,15 @@
# to cache snapshotting.
# max-concurrent-compactions = 0

# CompactThroughput is the rate limit in bytes per second that we
# will allow TSM compactions to happen. Note that short bursts are allowed
Copy link
Member

@jacobmarble jacobmarble Jul 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature exists to prevent long IO blocking, so instead of "compaction to happen." how about "compactions to write to disk.".

tsdb/config.go Outdated
@@ -37,6 +37,17 @@ const (
// will compact all TSM files in a shard if it hasn't received a write or delete
DefaultCompactFullWriteColdDuration = time.Duration(4 * time.Hour)

// DefaultCompactThroughput is the rate limit in bytes per second that we
// will allow TSM compactions to happen. Not that short bursts are allowed
// to happen at a possibly larger value, set by DefaultCompactBurstThroughput.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above wrt "happen".

# compact-throughput = "48m"

# CompactBurstThroughput is the rate limit in bytes per second that we
# will allow TSM compactions to happen.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, replace "happen" with clearer description.

tsdb/config.go Outdated
DefaultCompactThroughput = 0

// DefaultCompactBurstThroughput is the rate limit in bytes per second that we
// will allow TSM compactions to happen. If this is not set, the burst value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above wrt "happen".

tsdb/store.go Outdated
throughput := int(s.EngineOptions.Config.CompactThroughput)
burstThroughput := int(s.EngineOptions.Config.CompactBurstThroughput)
if throughput > 0 {
if burstThroughput == 0 {
Copy link
Member

@jacobmarble jacobmarble Jul 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't want burst < throughput, so how about this?

if burstThroughput < throughput {

tsdb/store.go Outdated
s.Logger.Info(
"Compaction throughput enabled",
zap.String("throughput", fmt.Sprintf("%d%%", throughput)),
zap.String("throughput_burst", fmt.Sprintf("%d%%", burstThroughput)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, it's called "throughput_burst" but elsewhere in the PR it's "burstThroughput" etc. Please select one ordering.

FWIW, I prefer the "throughput burst" ordering because it keeps the config keys "compact-throughput" and "compact-throughput-burst" together visually and lexicographically.

@jacobmarble jacobmarble self-assigned this Jul 18, 2018
@zachgoldstein
Copy link
Contributor Author

@jacobmarble Great! Excited to see this one move forward. I've rebased against master and made those changes. Let me know if there's anything else to do.

I see the AppVeyor build failed as well, though I'm not sure how it could be related to changes in this PR. I see a message saying: unable to deduce repository and source type for "collectd.org".

@jacobmarble
Copy link
Member

@zachgoldstein thanks, I'll take a look now. I just restarted the AppVeyor failure, I think it was a transient network failure.

Can you describe your use case? Will this feature fix a problem for your employer?

@jacobmarble
Copy link
Member

@zachgoldstein please run gofmt against the change:

# github.com/influxdata/influxdb/tsdb
tsdb\store.go:239:64: syntax error: unexpected newline, expecting comma or )
tsdb\store.go:258:64: syntax error: unexpected newline, expecting comma or )

@zachgoldstein
Copy link
Contributor Author

@jacobmarble I'm a freelancer, so in a previous contract I was helping to build out some financial analysis software. We chose to use InfluxDB as the main store for market securities data and bumped into an issue where the machine running influx would continually cycle between consuming all the machine's memory, crashing, and then restarting. I outlined a bit of this in #9938. For us, the issue was severe enough that it was close to being a dealbreaker in using InfluxDB. At the time we worked around the issue by making our shard size much smaller, which seemed to really cut memory consumption, but migrating to that was quite time consuming. Once the contract finished, I looked into the issue a bit more deeply, tracked it down to a TSM compaction kicking off, and then submitted the issue and this PR.

So ya, I'm not fixing any immediate issue I'm experiencing, just trying to address something I bumped into before and help move things forward.

Gah! Sorry fixed up the formatting just now.

@zachgoldstein
Copy link
Contributor Author

@jacobmarble Is there anything else here you'd like me to fix up?

Copy link
Member

@jacobmarble jacobmarble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I went on vacation, thought we had tied this up. Here's a comment left pending for two weeks.

tsdb/config.go Outdated
// will allow TSM compactions to write to disk. Not that short bursts are allowed
// to happen at a possibly larger value, set by DefaultCompactThroughputBurst.
// A value of 0 here will disable compaction rate limiting
DefaultCompactThroughput = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the default where it was, at 48 * 1024 * 1024, for both throughput and burst. As before, we want the user to explicitly set zero for unlimited.

@zachgoldstein
Copy link
Contributor Author

@jacobmarble Gotcha no worries! Totally missed that one comment, fixed it up and rebased now.

@jacobmarble jacobmarble merged commit 0348221 into influxdata:master Aug 7, 2018
@ghost ghost removed the proposed label Aug 7, 2018
@jacobmarble
Copy link
Member

@zachgoldstein thanks again for this!

@mjiderhamn
Copy link

Are there any recommendations for how to come up with a sensible value for this setting, or is it only trial and error for your particular set of data and hardware?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TSM compaction triggers memory crash
4 participants