Update queried metric to determine tps in system tests #31531

steviez · 2023-05-08T06:56:01Z

Problem

The test report is querying a metric that has been removed for being too noisy.

Summary of Changes

So, update the query to use a metric that we report once per slot. The metric bank-process_transactions was removed in #31398.

As for the two metrics ...

bank-process_transactions was a counter that was updated for any transactions from any slot
replay-slot-stats is submitted when a slot is completed

So, replay-slot-stats is inherently excluding transactions from discarded blocks (ie one that are abandoned and result in TooFewTicks). Looking at a previous test run before the metric was pulled out, I see the following values when I set dashboard time to 2023-05-03 06:23 ==> 2023-05-03 06:33:

    // New mean query: 21,122
    SELECT ROUND(MEAN("median_sum")) as "mean_tps" FROM (
      SELECT MEDIAN(sum_total_transactions) AS "median_sum" FROM (
        SELECT SUM("total_transactions") AS "sum_total_transactions"
          FROM "bare-perf-cpu-only"."autogen"."replay-slot-stats"
          WHERE time > :dashboardTime: AND time < :upperDashboardTime:
          GROUP BY time(1s), host_id)
      GROUP BY time(1s)
    )

    // Old mean query: 21,191
    SELECT ROUND(MEAN("median_sum")) as "mean_tps" FROM (
      SELECT MEDIAN(sum_count) AS "median_sum" FROM (
        SELECT SUM("count") AS "sum_count"
          FROM "bare-perf-cpu-only"."autogen"."bank-process_transactions"
          WHERE time > :dashboardTime: AND time < :upperDashboardTime: AND count > 0
          GROUP BY time(1s), host_id)
      GROUP BY time(1s)
    )

    // New max query: 80,262
    SELECT MAX("median_sum") as "max_tps" FROM (
      SELECT MEDIAN(sum_total_transactions) AS "median_sum" FROM (
        SELECT SUM("total_transactions") AS "sum_total_transactions"
          FROM "bare-perf-cpu-only"."autogen"."replay-slot-stats"
          WHERE time > :dashboardTime: AND time < :upperDashboardTime: AND total_transactions > 0
          GROUP BY time(1s), host_id)
      GROUP BY time(1s)
    )

    // Old max query: 71,359
    SELECT MAX("median_sum") as "max_tps" FROM (
      SELECT MEDIAN(sum_count) AS "median_sum" FROM (
        SELECT SUM("count") AS "sum_count"
          FROM "bare-perf-cpu-only"."autogen"."bank-process_transactions"
          WHERE time > :dashboardTime: AND time < :upperDashboardTime: AND count > 0
          GROUP BY time(1s), host_id)
      GROUP BY time(1s)
    )

The difference in means looks negligible
- Also as mentioned, the per-slot metric doesn't count abandoned slots, but it looks like there was only one abandoned slot in this particular run.
The difference in maxes is more significant, but I don't think this is problematic
- We're grouping things by 1s, so we seemingly had multiple blocks finish being replayed within the 1 second window. The per-slot metric includes all transactions for that slot, even though some of the tx's might have technically been replayed outside of the 1 second window.

The test report is querying a metric that has been removed for being too noisy. So, update the query to use a metric that we report once per slot.

yihau

just tested this PR on local-cluster and I think the result looks good

Test Configuration:

CLOUD_PROVIDER = bare
NUMBER_OF_VALIDATOR_NODES = 5
ENABLE_GPU = false
NUMBER_OF_CLIENT_NODES = 1
CLIENT_OPTIONS = bench-tps=1=--tx_count 10000 --thread-batch-sleep-ms 250
TEST_DURATION_SECONDS = 600
USE_PUBLIC_IP_ADDRESSES = false
ALLOW_BOOT_FAILURES = false
TEST_TYPE = fixed_duration

Result Details:

mean_tps: 24533
max_tps: 67469
mean_confirmation_ms: 804
max_confirmation_ms: 4286
99th_percentile_confirmation_ms: 2669
max_tower_distance: 43
last_tower_distance: 31
slots_per_second: 2.651

apfitzge

lgtm

Update queried metric to determine tps in system tests

556b7f0

The test report is querying a metric that has been removed for being too noisy. So, update the query to use a metric that we report once per slot.

steviez requested review from yihau and apfitzge May 8, 2023 06:56

yihau approved these changes May 8, 2023

View reviewed changes

apfitzge approved these changes May 8, 2023

View reviewed changes

steviez merged commit 0d9783c into solana-labs:master May 8, 2023
16 checks passed

steviez deleted the system_test_tps branch May 8, 2023 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update queried metric to determine tps in system tests #31531

Update queried metric to determine tps in system tests #31531

steviez commented May 8, 2023 •

edited

yihau left a comment

apfitzge left a comment

Update queried metric to determine tps in system tests #31531

Update queried metric to determine tps in system tests #31531

Conversation

steviez commented May 8, 2023 • edited

Problem

Summary of Changes

yihau left a comment

Choose a reason for hiding this comment

apfitzge left a comment

Choose a reason for hiding this comment

steviez commented May 8, 2023 •

edited