Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rptest: Transaction workload with scaling in mind #16148

Merged
merged 4 commits into from
Jan 31, 2024

Conversation

savex
Copy link
Contributor

@savex savex commented Jan 18, 2024

Implementation of test and workload that can be scaled to any number of tasks and jobs inside flink with no job overhead while running it, comparing to Table API.

Java version of similar workload is here.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

  • none

@savex savex self-assigned this Jan 18, 2024
@savex savex force-pushed the dp-1013-multi-workload-test branch 2 times, most recently from 5b77065 to 75a957b Compare January 18, 2024 22:43
@savex
Copy link
Contributor Author

savex commented Jan 19, 2024

2M events from single node, 524577 per single job.
One topic per job, 4 total.
Total number of events: 2 * 1024 * 1024 = 2097152

Test run details

[INFO  - 2024-01-19 00:26:23,447 - flink_scale_test - test_transactions_scale_single_node - lineno:160]: Topic: flink_scale_0, watermark: 524577
[INFO  - 2024-01-19 00:26:23,447 - flink_scale_test - test_transactions_scale_single_node - lineno:160]: Topic: flink_scale_1, watermark: 524481
[INFO  - 2024-01-19 00:26:23,447 - flink_scale_test - test_transactions_scale_single_node - lineno:160]: Topic: flink_scale_2, watermark: 524481
[INFO  - 2024-01-19 00:26:23,447 - flink_scale_test - test_transactions_scale_single_node - lineno:160]: Topic: flink_scale_3, watermark: 524465
[INFO  - 2024-01-19 00:26:23,447 - flink_scale_test - test_transactions_scale_single_node - lineno:162]: Total messages/High watermark sum: 2098004

RP Metrics for produce only:

Every 2.0s: curl -s http://ip-172-31-12-203:9644/metrics | grep vectorized_internal_rpc_latency_sum | grep commit_tx                                                                                                                                                                                                                        ip-172-31-24-235: Fri Jan 19 00:26:32 2024

vectorized_internal_rpc_latency_sum{method="commit_tx",service="tx_gateway",shard="0"} 37655
vectorized_internal_rpc_latency_sum{method="commit_tx",service="tx_gateway",shard="1"} 32250
vectorized_internal_rpc_latency_sum{method="commit_tx",service="tx_gateway",shard="2"} 30668
vectorized_internal_rpc_latency_sum{method="commit_tx",service="tx_gateway",shard="3"} 19419

@savex savex marked this pull request as ready for review January 19, 2024 23:34
@savex savex requested a review from bharathv January 19, 2024 23:38
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 22, 2024

@savex savex force-pushed the dp-1013-multi-workload-test branch 2 times, most recently from 48d196e to efe38f1 Compare January 24, 2024 22:03
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 25, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/44246#018d3dbe-29ac-477d-90a4-51f0064be700:

"rptest.tests.cloud_storage_timing_stress_test.CloudStorageTimingStressTest.test_cloud_storage_with_partition_moves.cleanup_policy=delete"

new failures in https://buildkite.com/redpanda/redpanda/builds/44454#018d57c0-fa81-4395-aa78-be822b8248b9:

"rptest.tests.flink_basic_test.FlinkBasicTests.test_transaction_workload"

tests/rptest/e2e_tests/flink_scale_test.py Show resolved Hide resolved
tests/rptest/e2e_tests/flink_scale_test.py Outdated Show resolved Hide resolved
tests/rptest/e2e_tests/flink_scale_test.py Outdated Show resolved Hide resolved
tests/rptest/e2e_tests/flink_scale_test.py Outdated Show resolved Hide resolved
tests/rptest/e2e_tests/flink_scale_test.py Outdated Show resolved Hide resolved
tests/rptest/e2e_tests/flink_scale_test.py Outdated Show resolved Hide resolved
tests/rptest/e2e_tests/flink_scale_test.py Outdated Show resolved Hide resolved
tests/rptest/e2e_tests/flink_scale_test.py Show resolved Hide resolved
@savex savex requested a review from bharathv January 26, 2024 21:37
Copy link
Contributor

@bharathv bharathv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm after fixing the commit history.

tests/rptest/e2e_tests/flink_scale_test.py Outdated Show resolved Hide resolved
tests/rptest/e2e_tests/flink_scale_test.py Show resolved Hide resolved
@savex savex force-pushed the dp-1013-multi-workload-test branch from 289e5e6 to 621007f Compare January 29, 2024 23:07
@savex savex requested a review from bharathv January 29, 2024 23:07
    Test uses simple workload with NumberSequence class that generated
    messages with single index number in content. Target is to generate
    as many transactions as possible with minimum time spent and a
    single transformation applied at producr time (int -> str) which
    will hapen on flink side
    Test will be run using single node and have an ability to run
    several workloads in parallel with configurable parameter on whether
    it will use single topic or create separate topics for each workload
    Also, there is an assertion going on using metrics and high
    watermark from RP
@savex savex force-pushed the dp-1013-multi-workload-test branch from 621007f to fa5771b Compare January 29, 2024 23:46
@savex
Copy link
Contributor Author

savex commented Jan 29, 2024

Squashed review fixes and rebased to dev

@savex
Copy link
Contributor Author

savex commented Jan 30, 2024

Bumped timeout due to docker env at CDT is too slow when running whole test suite. Also added comment about detect_idle_jobs var.

@bharathv
Copy link
Contributor

/ci-repeat 2
release
skip-unit
skip-redpanda-build
dt-repeat=20
tests/rptest/tests/flink_basic_test.py
tests/rptest/e2e_tests/flink_scale_test.py

@bharathv
Copy link
Contributor

/ci-repeat 2
release
skip-unit
dt-repeat=20
tests/rptest/tests/flink_basic_test.py
tests/rptest/e2e_tests/flink_scale_test.py

@bharathv bharathv merged commit f90c5f0 into dev Jan 31, 2024
17 checks passed
@bharathv bharathv deleted the dp-1013-multi-workload-test branch January 31, 2024 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants