-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rptest: Refactor consumer validation and tune table idle settings #16250
Conversation
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44166#018d37f8-1d56-4f2d-ad12-67cb9060e267 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44227#018d3d0c-2afa-482d-835e-713d10aae674 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44312#018d42d5-d04c-4b23-83e0-5fcf04130c1a |
93cbf3d
to
36adda4
Compare
36adda4
to
7c37a4f
Compare
@@ -128,6 +128,12 @@ def setup(self): | |||
table_env = StreamTableEnvironment.create( | |||
stream_execution_environment=env, environment_settings=settings) | |||
|
|||
# Tune table idle state handling | |||
# Clear the state if it has not changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious: are these needed for this fix or these just nice to have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a safe guard to trigger cleanups/aborts for ongoing actions/transformations. Like we do not want flink to wait infinitely (default) for other party to answer if Table API is querying something in RP
/ci-repeat 1 |
/ci-repeat 1 |
Use internal jobmanager metrics to detect if job's subtasks (vertices) has been idle for 30 sec. Key metrics used is accumulated-idle-time and accumulated-busy-time
10a1656
to
bd25808
Compare
EC2 check
|
/ci-repeat 1 |
new failures in https://buildkite.com/redpanda/redpanda/builds/44317#018d42dc-79df-4324-8242-621b6f201002:
new failures in https://buildkite.com/redpanda/redpanda/builds/44312#018d4672-d00c-4fc6-84f7-dd4cd71c7e0c:
|
Caught the reason for above errors:
When flink parallelize INSERT operators to different jobs, it uses the same transactionId |
This is the result of this issue. |
When flink parallelize jobs based on different INSERT operators it uses the same sink.transactional-id-prefix as set for single temporary table. This causes fencing on the RP side and as a result provides flackyness when run in slower environments (docker). This is fixed by create/delete a temporary table for each batch.
/ci-repeat 1 |
copy operation sometimes not finishes copying and/or not copying file at all. Rely on a ssh_output in scope of basic test
abbd201
to
7506cef
Compare
/ci-repeat 1 |
active = [ | ||
job for job in jobs['jobs'] | ||
if job['status'] in self.job_active_statuses | ||
] | ||
return active | ||
|
||
def _has_active_jobs(self): | ||
def is_job_idle(self, jobid) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is way too complicated lol, cant imagine flink API is this horrible, we are probably missing something.
Since Table API consumer waits for the index indefinitely, update validation to
This PR also fixes logs copy for docker envs by properly getting hostname
Fixes: redpanda-data/devprod#1031
Backports Required
Release Notes