Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track cache, WAL, filestore stats within tsm1 engine #5758

Merged
merged 12 commits into from
Feb 22, 2016

Conversation

mark-rushakoff
Copy link
Contributor

This PR adds stats to track disk usage for the tsm1 FileStore and WAL, and disk+memory for the Cache. The stats are tracked per-engine, not per-file.

During manual testing, the stats seem to be consistent with file sizes on disk, inspected out-of-band from the influxd process.

@mark-rushakoff
Copy link
Contributor Author

I didn't run tests locally with -race enabled, so I'll eventually fix those and amend this PR. Happy to implement any other feedback along the way.

jonseymour and others added 4 commits February 20, 2016 22:18
Complementing and extending the changes in #5758.

Add 2 level statistics:

  * snapshotCount
  * cacheAgeMs

Add 2 counter statistics

  * cachedBytes
  * WALCompactionTimeMs

snapshotCount can be used to measure transient write errors that are causing snapshots to accumulate

cacheAgeMs can be used to guage the level of write activity into the cache

The differences between cachedBytes stats sampled at different times can be used to calculate cache throughput rates

The ratio (cachedBytes-diskBytes)/WALCompactionTimeMs can be used calculate WAL compaction throughput.

The ratio of difference between first and last WAL compaction time over the interval
length is an estimate of percentage of cache throughput consumed.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
tsm: cache: add cache throughput related statistics.
The intent of this change is to ensure that all statistic fields of the
resulting tsm1_cache measurement are initialized on initialization of
the cache. That way, any consumer of those measurements doesn't
have to deal with the null case.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
tsm: cache: ensure all statistics are initialised on cache creation.
@mark-rushakoff mark-rushakoff changed the title Track disk stats within tsm1 engine Track cache, WAL, filestore stats within tsm1 engine Feb 21, 2016
jonseymour and others added 2 commits February 22, 2016 08:26
Since we are not locking but relying on atomic arithmetic,
use Add rather than Set. Will also result in slightly less garbage
being created.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
tsm: cache: during writes, update the memSize statistic outside the lock
@jonseymour
Copy link
Contributor

@mark-rushakoff - I wonder if we want to do something about suppressing stats from the snapshot caches? For example...

snapshots

I'll submit a PR with my suggestion about how to do this.

…ler constructor

The intent of this change is to avoid writing caches created for
snapshot cache instances into the tsm1_cache measurement. We can do
this by avoiding use of the NewCache constructor. All other methods
are only intended to be called from on the engine cache - never
on a snapshot.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
@jonseymour
Copy link
Contributor

My suggestion for this is found in #5778.

@jonseymour
Copy link
Contributor

@mark-rushakoff There doesn't seem to be a way to stop publishing statistics so cache statistics live even after the shard has been closed or even if the database that contains them has been deleted.

Is there some way to stop a statistics map being published when it is no longer in use?

@mark-rushakoff
Copy link
Contributor Author

@jonseymour we're using the expvar package directly, which does not expose a way to remove or unpublish a stat. Internally there's been a conversation or two about this but we haven't discussed it in depth yet. For now, I think it's fine to have the "dead" stats around, although I'd certainly like to address it before the 0.11 release.

@mark-rushakoff
Copy link
Contributor Author

@jwilder This PR should be ready for review now.

@jonseymour
Copy link
Contributor

@mark-rushakoff is it ok for me to raise an issue regarding cleaning up "idle" statistics keys?

@mark-rushakoff
Copy link
Contributor Author

Sure, feel free to open a separate issue. Thanks.

On Mon, Feb 22, 2016 at 10:45 AM, Jon Seymour notifications@github.com
wrote:

@mark-rushakoff https://github.com/mark-rushakoff is it ok for me to
raise an issue regarding cleaning up "idle" statistics keys?


Reply to this email directly or view it on GitHub
#5758 (comment).

@jwilder jwilder added this to the 0.11.0 milestone Feb 22, 2016
@jwilder
Copy link
Contributor

jwilder commented Feb 22, 2016

Needs a changelog update. 👍 otherwise.

mark-rushakoff added a commit that referenced this pull request Feb 22, 2016
Track cache, WAL, filestore stats within tsm1 engine
@mark-rushakoff mark-rushakoff merged commit fc5c859 into master Feb 22, 2016
@mark-rushakoff mark-rushakoff deleted the mr-disk-stats branch February 22, 2016 21:01
jonseymour added a commit to jonseymour/influxdb that referenced this pull request Feb 29, 2016
Complementing and extending the changes in influxdata#5758.

Add 2 level statistics:

  * snapshotCount
  * cacheAgeMs

Add 2 counter statistics

  * cachedBytes
  * WALCompactionTimeMs

snapshotCount can be used to measure transient write errors that are causing snapshots to accumulate

cacheAgeMs can be used to guage the level of write activity into the cache

The differences between cachedBytes stats sampled at different times can be used to calculate cache throughput rates

The ratio (cachedBytes-diskBytes)/WALCompactionTimeMs can be used calculate WAL compaction throughput.

The ratio of difference between first and last WAL compaction time over the interval
length is an estimate of percentage of cache throughput consumed.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants