Skip to content

Cap ctst-end2end-sharded test step at 190 min#2454

Merged
bert-e merged 1 commit into
development/2.15from
improvement/ZENKO-5306/ctst-step-timeout
Jun 30, 2026
Merged

Cap ctst-end2end-sharded test step at 190 min#2454
bert-e merged 1 commit into
development/2.15from
improvement/ZENKO-5306/ctst-step-timeout

Conversation

@delthas

@delthas delthas commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

What

Add timeout-minutes: 190 to the Run CTST end to end tests step of the ctst-end2end-sharded job in .github/workflows/end2end.yaml. One-line change.

Why

The job has no job-level timeout and the test step has no per-step timeout, so it inherits GitHub's 360-min (6 h) default. When the CTST suite hangs, the only thing that stops it is job-level cancellation at 6 h — which grants the if: always() Archive and publish artifacts step only a ~5-minute grace window before the runner is SIGTERM'd, so the archive uploads partial logs or nothing. A per-step timeout-minutes fails only that step; the job is not cancelled and continues to the archive step, which runs with full remaining budget.

Why 190

This step is a hard backstop. The graceful stop is now done in-process by an 180-min cucumber-level time budget (ZENKO-5309), which lets cucumber finish and write its reports before this timeout would hard-kill it. So the ordering is: 180 min (graceful) < 190 min (hard backstop) < the old 360-min job cap.

190 still sits well above the observed max duration of passing runs — over the last ~400 runs, the passing step ran 61–155 min (p50 91, p95 129, max 155) — so it won't kill legitimately-slow passing runs.

Scope: ctst-end2end-sharded only — the other e2e jobs finish in 20–53 min and aren't prone to the multi-hour hang.

Issue: ZENKO-5306

@bert-e

bert-e commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Hello delthas,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@scality scality deleted a comment from bert-e Jun 30, 2026
@bert-e

bert-e commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

@delthas delthas requested review from a team, SylvainSenechal and benzekrimaha June 30, 2026 09:11

@SylvainSenechal SylvainSenechal left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok but take a look at the cucumber documentation, there might be an option to allow for nicer shutdown than this

Edit: I think 2h30 could be acceptable too

@delthas

delthas commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Highest we've seen is 155min (see my msg). But I'll lower it a bit then.

@delthas delthas force-pushed the improvement/ZENKO-5306/ctst-step-timeout branch from fae04a2 to 35e327c Compare June 30, 2026 13:13
Comment thread .github/workflows/end2end.yaml
@delthas delthas changed the title Cap ctst-end2end-sharded test step at 210 min Cap ctst-end2end-sharded test step at 190 min Jun 30, 2026
@delthas

delthas commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Also added a cucumber-level timeout in #2457

The CTST test step had no per-step timeout, so a hung run was only
stopped by the 360-min job cap. Job cancellation gives the always()
archive step only a ~5-min grace window, too short to collect and upload
kind/pod logs, so hung runs lose their artifacts. A per-step timeout fails
just the step, not the job, so the archive step runs with full budget.

190 min is a hard backstop sitting just above the 180-min in-process
cucumber time budget (ZENKO-5309), which stops the run gracefully and
writes reports first. It stays well above the observed max duration of
passing runs (~155 min; p50 ~91, p95 ~129) measured over ~400 runs, so it
will not kill legitimately-slow passing runs.

Issue: ZENKO-5306
@delthas delthas force-pushed the improvement/ZENKO-5306/ctst-step-timeout branch from 35e327c to 3742254 Compare June 30, 2026 14:50
@delthas

delthas commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

/approve

@bert-e

bert-e commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Build failed

The build for commit did not succeed in branch improvement/ZENKO-5306/ctst-step-timeout

The following options are set: approve

@bert-e

bert-e commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

I have successfully merged the changeset of this pull request
into targetted development branches:

  • ✔️ development/2.15

The following branches have NOT changed:

  • development/2.10
  • development/2.11
  • development/2.12
  • development/2.13
  • development/2.14
  • development/2.5
  • development/2.6
  • development/2.7
  • development/2.8
  • development/2.9

This pull request did not target the following hotfix branch(es) so they
were left untouched:

  • hotfix/2.13.5

Please check the status of the associated issue ZENKO-5306.

Goodbye delthas.

The following options are set: approve

@bert-e bert-e merged commit 3742254 into development/2.15 Jun 30, 2026
37 of 38 checks passed
@bert-e bert-e deleted the improvement/ZENKO-5306/ctst-step-timeout branch June 30, 2026 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants