Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

05574 d busy time metric #5798

Merged
merged 27 commits into from May 4, 2023
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
33 changes: 33 additions & 0 deletions platform-sdk/docs/base/metrics/busy-time-metric.md
lpetrovic05 marked this conversation as resolved.
Show resolved Hide resolved
@@ -0,0 +1,33 @@
### Background

There are a number of critical threads in our pipeline whose performance must be monitored closely. Investigating
performance issues often requires a lot of time to narrow down the issue.

### Problems with previous approaches

- In many places we track the average time something takes. This number by itself is not that useful, without the
knowledge how many times it has been executed on a particular thread.
- Metrics are often updated when the work is done. This means that if a thread is blocked, we have no insight into what
is going on. This also means that some work that takes a long time within one sampling period might be reported in a
subsequent sampling period, giving us misleading information.
- Some attempts have been made to unify this with cycle metrics. These are a collection of metrics updated through a
single instance.
- Since they all need to be updated through a single class, this would mean that all classes executing on a
particular thread need a dependency to it.
- Since they track cyclical work, they do not handle situations where a thread does different type of work

## Busy time metric

- All thread work is tracked by percentages (or fractions)
- All subtasks of a thread are tracked individually
- The metric tracks the current status of the thread, so that accurate information can be sampled even if the thread is
stuck
- it is lock-free and garbage-free. The metric implementation stores all relevant information within an integer
pair that can be updated atomically.
- This can track the overall busyness of a thread
- It can also track each part of the execution to get more detailed information about where a thread is spending its
time

### Example diagram

![](busy-time.svg)
4 changes: 4 additions & 0 deletions platform-sdk/docs/base/metrics/busy-time.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions platform-sdk/docs/platformWiki.md
Expand Up @@ -15,6 +15,7 @@ This code is maintained by the "Platform Base" team.
- [Configuration](./base/configuration/configuration.md)
- Metrics
- Prometheus
- [Busy time metrics](./base/metrics/busy-time-metric.md)
- Logging
- Thread Management

Expand Down
Expand Up @@ -16,6 +16,7 @@

package com.swirlds.common.metrics.atomic;

import com.swirlds.common.utility.ByteUtils;
import java.util.concurrent.atomic.AtomicLong;
import java.util.function.BiFunction;
import java.util.function.IntBinaryOperator;
Expand All @@ -26,7 +27,6 @@
* Holds two integers that can be updated atomically
*/
public class AtomicIntPair {
public static final int INT_BITS = 32;
private static final int RESET_VALUE = 0;
private final AtomicLong container;
private final LongBinaryOperator operator;
Expand All @@ -46,9 +46,11 @@ public AtomicIntPair() {
*/
public AtomicIntPair(final IntBinaryOperator leftAccumulator, final IntBinaryOperator rightAccumulator) {
alittley marked this conversation as resolved.
Show resolved Hide resolved
operator = (current, supplied) -> {
final int left = leftAccumulator.applyAsInt(extractLeft(current), extractLeft(supplied));
final int right = rightAccumulator.applyAsInt(extractRight(current), extractRight(supplied));
return combine(left, right);
final int left =
leftAccumulator.applyAsInt(ByteUtils.extractLeftInt(current), ByteUtils.extractLeftInt(supplied));
final int right = rightAccumulator.applyAsInt(
ByteUtils.extractRightInt(current), ByteUtils.extractRightInt(supplied));
return ByteUtils.combineInts(left, right);
};
this.container = new AtomicLong(RESET_VALUE);
}
Expand All @@ -61,21 +63,21 @@ public AtomicIntPair(final IntBinaryOperator leftAccumulator, final IntBinaryOpe
* @param rightValue the value provided to the left integer
*/
public void accumulate(final int leftValue, final int rightValue) {
container.accumulateAndGet(combine(leftValue, rightValue), operator);
container.accumulateAndGet(ByteUtils.combineInts(leftValue, rightValue), operator);
}

/**
* @return the current value of the left integer
*/
public int getLeft() {
return extractLeft(container.get());
return ByteUtils.extractLeftInt(container.get());
}

/**
* @return the current value of the right integer
*/
public int getRight() {
return extractRight(container.get());
return ByteUtils.extractRightInt(container.get());
}

/**
Expand All @@ -86,7 +88,7 @@ public int getRight() {
*/
public double computeDouble(final ToDoubleBiFunction<Integer, Integer> compute) {
final long twoInts = container.get();
return compute.applyAsDouble(extractLeft(twoInts), extractRight(twoInts));
return compute.applyAsDouble(ByteUtils.extractLeftInt(twoInts), ByteUtils.extractRightInt(twoInts));
}

/**
Expand All @@ -106,8 +108,8 @@ public double computeDoubleAndReset(final ToDoubleBiFunction<Integer, Integer> c
*/
public double computeDoubleAndSet(
final ToDoubleBiFunction<Integer, Integer> compute, final int left, final int right) {
final long twoInts = container.getAndSet(combine(left, right));
return compute.applyAsDouble(extractLeft(twoInts), extractRight(twoInts));
final long twoInts = container.getAndSet(ByteUtils.combineInts(left, right));
return compute.applyAsDouble(ByteUtils.extractLeftInt(twoInts), ByteUtils.extractRightInt(twoInts));
}

/**
Expand All @@ -117,7 +119,7 @@ public double computeDoubleAndSet(
* @param right the right value to set
*/
public void set(final int left, final int right) {
container.set(combine(left, right));
container.set(ByteUtils.combineInts(left, right));
}

/**
Expand All @@ -129,23 +131,23 @@ public void set(final int left, final int right) {
*/
public <T> T compute(final BiFunction<Integer, Integer, T> compute) {
final long twoInts = container.get();
return compute.apply(extractLeft(twoInts), extractRight(twoInts));
return compute.apply(ByteUtils.extractLeftInt(twoInts), ByteUtils.extractRightInt(twoInts));
}

/**
* Same as {@link #compute(BiFunction)} but also atomically resets the integers to {@code 0}
alittley marked this conversation as resolved.
Show resolved Hide resolved
*/
public <T> T computeAndReset(final BiFunction<Integer, Integer, T> compute) {
final long twoInts = container.getAndSet(RESET_VALUE);
return compute.apply(extractLeft(twoInts), extractRight(twoInts));
return compute.apply(ByteUtils.extractLeftInt(twoInts), ByteUtils.extractRightInt(twoInts));
}

/**
* Same as {@link #compute(BiFunction)} but also atomically sets the integers to the provided values.
*/
public <T> T computeAndSet(final BiFunction<Integer, Integer, T> compute, final int left, final int right) {
lpetrovic05 marked this conversation as resolved.
Show resolved Hide resolved
final long twoInts = container.getAndSet(combine(left, right));
return compute.apply(extractLeft(twoInts), extractRight(twoInts));
final long twoInts = container.getAndSet(ByteUtils.combineInts(left, right));
return compute.apply(ByteUtils.extractLeftInt(twoInts), ByteUtils.extractRightInt(twoInts));
}

/**
Expand All @@ -154,16 +156,4 @@ public <T> T computeAndSet(final BiFunction<Integer, Integer, T> compute, final
public void reset() {
container.getAndSet(RESET_VALUE);
}

private static int extractLeft(final long pair) {
netopyr marked this conversation as resolved.
Show resolved Hide resolved
return (int) (pair >> INT_BITS);
}

private static int extractRight(final long pair) {
return (int) pair;
}

private static long combine(final int left, final int right) {
return (((long) left) << INT_BITS) | (right & 0xffffffffL);
}
}
lpetrovic05 marked this conversation as resolved.
Show resolved Hide resolved
@@ -0,0 +1,192 @@
/*
* Copyright (C) 2023 Hedera Hashgraph, LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package com.swirlds.common.metrics.extensions;

import com.swirlds.common.metrics.FloatFormats;
import com.swirlds.common.metrics.FunctionGauge;
import com.swirlds.common.metrics.Metrics;
import com.swirlds.common.time.IntegerEpochTime;
import com.swirlds.common.time.OSTime;
import com.swirlds.common.time.Time;
import com.swirlds.common.utility.ByteUtils;
import edu.umd.cs.findbugs.annotations.NonNull;
import java.util.concurrent.atomic.AtomicLong;

/**
* A metric that measures the fraction of time that a thread is busy. This metric could be used to track the overall busy
* time of a thread, or the busy time of a specific subtask. The granularity of this metric is in microseconds. A
* snapshot of this metric must be taken at least once every 34 minutes in order to get accurate data.
*/
public class BusyTime {
/** passed to the accumulator method to indicate that work has started */
private static final long WORK_START = 0;
/** passed to the accumulator method to indicate that work has ended */
private static final long WORK_END = 1;
/** the initial value of status when the instance is created */
private static final int INITIAL_STATUS = -1;
/** An instance that provides the current time */
private final IntegerEpochTime time;
/** Used to atomically update and reset the time and status */
private final AtomicLong accumulator;

/**
* The default constructor, uses the {@link OSTime} instance to get the current time
*/
public BusyTime() {
this(OSTime.getInstance());
}

/**
* A constructor where a custom {@link Time} instance could be supplied
*
* @param time
* provides the current time
*/
public BusyTime(@NonNull final Time time) {
this.time = new IntegerEpochTime(time);
this.accumulator = new AtomicLong(ByteUtils.combineInts(this.time.getMicroTime(), INITIAL_STATUS));
}

/**
* Adds a {@link FunctionGauge} to the given {@link Metrics} instance
*
* @param metrics
* the metrics instance to add the metric to
* @param category the kind of {@code Metric} (metrics are grouped or filtered by this)
* @param name a short name for the {@code Metric}
* @param description a one-sentence description of the {@code Metric}
*/
public void addMetric(
@NonNull final Metrics metrics,
@NonNull final String category,
@NonNull final String name,
@NonNull final String description) {
metrics.getOrCreate(new FunctionGauge.Config<>(category, name, Double.class, this::getAndReset)
.withDescription(description)
.withUnit("fraction")
.withFormat(FloatFormats.FORMAT_1_3));
}

/**
* Notifies the metric that work has started
*/
public void startingWork() {
accumulator.accumulateAndGet(WORK_START, this::statusUpdate);
lpetrovic05 marked this conversation as resolved.
Show resolved Hide resolved
}

/**
* Notifies the metric that work has finished
*/
public void finishedWork() {
accumulator.accumulateAndGet(WORK_END, this::statusUpdate);
}

/**
* @return the fraction of time that the thread has been busy, where 0.0 means the thread is not at all busy, and
* 1.0 means that the thread is 100% busy
*/
public double getBusyFraction() {
final long pair = accumulator.get();
return busyFraction(ByteUtils.extractLeftInt(pair), ByteUtils.extractRightInt(pair));
}

/**
* Same as {@link #getBusyFraction()} but also resets the metric
*
* @return the fraction of time that the thread has been busy, where 0.0 means the thread is not at all busy, and
* 1.0
* means that the thread is 100% busy
*/
public double getAndReset() {
final long pair = accumulator.getAndUpdate(this::reset);
return busyFraction(ByteUtils.extractLeftInt(pair), ByteUtils.extractRightInt(pair));
}

/**
* Gets the fraction of time this thread has been busy since the last reset
*
* @param measurementStart
* the micro epoch time when the last reset occurred
* @param status
* the current status of the thread and the time spent in the opposite status
* @return the fraction of time that the thread has been busy, where 0.0 means the thread is not at all busy, and
* 1.0
* means that the thread is 100% busy
*/
private double busyFraction(final int measurementStart, final int status) {
final int elapsedTime = time.microsElapsed(measurementStart);
if (elapsedTime == 0) {
return 0;
}
final int busyTime;
if (isIdle(status)) {
busyTime = Math.abs(status) - 1;
} else {
busyTime = elapsedTime - (status - 1);
}
return ((double) busyTime) / elapsedTime;
}

private boolean isIdle(final int status) {
return status < 0;
}

private long statusUpdate(final long previousPair, final long statusChange) {
// the epoch time when the last reset occurred
final int measurementStart = ByteUtils.extractLeftInt(previousPair);
// the current thread status is represented by the (+ -) sign. the number represents the time spent in the
// opposite status. this is that its time spent being busy/idle can be deduced whenever the sample is taken. if
// the time spend busy is X, and the measurement time is Y, then idle time is Y-X. since zero is neither
// positive nor negative, the values are offset by 1
final int currentStatus = ByteUtils.extractRightInt(previousPair);
// the current micro epoch time
final int currentTime = time.getMicroTime();

if ((statusChange == WORK_START && !isIdle(currentStatus))
|| (statusChange == WORK_END && isIdle(currentStatus))) {
// this means that the metric has not been updated correctly, we will not change the value
return previousPair;
lpetrovic05 marked this conversation as resolved.
Show resolved Hide resolved
}
// the time elapsed since the last reset
final int elapsedTime = IntegerEpochTime.elapsed(measurementStart, currentTime);
if (statusChange == WORK_START) {
// this means the thread was previously idle and now started working
// the time spent being busy beforehand
final int busyTime = Math.abs(currentStatus) - 1;
// the time spent idle is all the elapsed time minus the time spent busy
final int idleTime = elapsedTime - busyTime;
return ByteUtils.combineInts(measurementStart, idleTime + 1);
}
// this means the thread was previously busy and now stopped working
// the time spent being idle beforehand
final int idleTime = currentStatus - 1;
// the time spent busy is all the elapsed time minus the time spent idle
final int busyTime = elapsedTime - idleTime;
return ByteUtils.combineInts(measurementStart, -busyTime - 1);
}

private long reset(final long currentPair) {
return ByteUtils.combineInts(time.getMicroTime(), resetStatus(ByteUtils.extractRightInt(currentPair)));
}

private int resetStatus(final int status) {
if (status > 0) {
return 1;
}
return -1;
}
}