When computing summary metrics, gmetad's current behavior is for
each cluster thread to grab the summarization lock, peform all of
its summarization, and then release the lock. This can block the
global aggregation thread if a single cluster is taking a long
time to summarize its RRDs for whatever reason; this issue
manifests are dropouts in the cluster and grid summary RRDs. This
also comes up if you have a large number of large clusters which
each take a long time to summarize.
The change here is to have each cluster thread perform its
summarization into a "pending" hash table independently, and then
once the summarization is complete each thread 1) grabs the lock,
2) swaps in the finished pending hash table with the fully-
aggregated one, and then 3) releases the lock immediately.
This significantly reduces the amount of time that the
summarization lock is held by each cluster thread, and also
eliminates the possibility of a single blocked cluster thread
preventing all summary metrics from being written.