Improve metric summarization performance under heavy load #92

Merged
merged 1 commit into from Apr 30, 2013

Projects

None yet

2 participants

@comptonqc
Contributor

When computing summary metrics, gmetad's current behavior is for
each cluster thread to grab the summarization lock, peform all of
its summarization, and then release the lock. This can block the
global aggregation thread if a single cluster is taking a long
time to summarize its RRDs for whatever reason; this issue
manifests are dropouts in the cluster and grid summary RRDs. This
also comes up if you have a large number of large clusters which
each take a long time to summarize.

The change here is to have each cluster thread perform its
summarization into a "pending" hash table independently, and then
once the summarization is complete each thread 1) grabs the lock,
2) swaps in the finished pending hash table with the fully-
aggregated one, and then 3) releases the lock immediately.
This significantly reduces the amount of time that the
summarization lock is held by each cluster thread, and also
eliminates the possibility of a single blocked cluster thread
preventing all summary metrics from being written.

@comptonqc comptonqc Improve metric summarization performance under heavy load
When computing summary metrics, gmetad's current behavior is for
each cluster thread to grab the summarization lock, peform all of
its summarization, and then release the lock. This can block the
global aggregation thread if a single cluster is taking a long
time to summarize its RRDs for whatever reason; this issue
manifests are dropouts in the cluster and grid summary RRDs. This
also comes up if you have a large number of large clusters which
each take a long time to summarize.

The change here is to have each cluster thread perform its
summarization into a "pending" hash table independently, and then
once the summarization is complete each thread 1) grabs the lock,
2) swaps in the finished pending hash table with the fully-
aggregated one, and then 3) releases the lock immediately.
This significantly reduces the amount of time that the
summarization lock is held by each cluster thread, and also
eliminates the possibility of a single blocked cluster thread
preventing all summary metrics from being written.
0705a5d
@vvuksan vvuksan merged commit 8ff4bb3 into ganglia:master Apr 30, 2013

1 check passed

default The Travis build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment