trusted-execution-clusters: Verbose SSH attempt#77892
trusted-execution-clusters: Verbose SSH attempt#77892Jakob-Naucke wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThe provisioning script now hardens SSH usage: disables password auth, checks TCP reachability to the host, ensures an ssh-agent with the private key is running, and uses verbose SSH client output during the existing retry loop for connection tests. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 12✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/pj-rehearse |
|
@Jakob-Naucke: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
51848ec to
b7043a3
Compare
|
/pj-rehearse |
|
@Jakob-Naucke: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse network-access-allowed |
|
@alicefr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse |
|
@Jakob-Naucke: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
bfd29e8 to
12221f9
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Jakob-Naucke The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/pj-rehearse |
|
@Jakob-Naucke: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
ci-operator/step-registry/trusted-execution-clusters/ref/operator/beaker-kind-provision/trusted-execution-clusters-ref-operator-beaker-kind-provision-commands.sh (2)
166-170: ssh-agent setup is redundant with-iand leaves an orphan agent.
SSHOPTSalready pins the identity via-i "${SSH_PKEY}", sosshwill use that key directly; starting an agent andssh-add-ing the same key doesn't change authentication behavior — it just adds another thing that can fail silently (there is noset -e, so a brokenssh-agent/ssh-addwould go unnoticed). The spawned agent is also never killed, so it lingers for the rest of the job.
wc -l "${SSH_PKEY}"is harmless but doesn't really add diagnostic value beyond checking the file is non-empty / well-formed;ssh-keygen -y -f "${SSH_PKEY}" >/dev/nullwould actually validate the key is parseable, which is a more useful signal for the symptom you're chasing.Optional cleanup if you want to keep the agent path:
♻️ Alternative
-log_info "Testing SSH connection to ${BEAKER_USER}@${BEAKER_IP}..." -eval "$(ssh-agent -s)" -ssh-add "${SSH_PKEY}" - -log_info "SSH private key line count for checking" -wc -l "${SSH_PKEY}" +log_info "Testing SSH connection to ${BEAKER_USER}@${BEAKER_IP}..." +log_info "Validating SSH private key is parseable..." +if ! ssh-keygen -y -f "${SSH_PKEY}" >/dev/null; then + log_error "SSH private key at ${SSH_PKEY} failed to parse" + exit 1 +fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ci-operator/step-registry/trusted-execution-clusters/ref/operator/beaker-kind-provision/trusted-execution-clusters-ref-operator-beaker-kind-provision-commands.sh` around lines 166 - 170, The ssh-agent/ssh-add lines are redundant because SSHOPTS already uses -i "${SSH_PKEY}", and they spawn an orphan agent; remove the eval "$(ssh-agent -s)" and ssh-add "${SSH_PKEY}" calls and instead validate the private key file referenced by SSH_PKEY (replace the wc -l check) by invoking ssh-keygen -y -f "${SSH_PKEY}" >/dev/null to ensure the key is parseable; if you must keep agent behavior, ensure you properly record and kill the agent (SSH_AUTH_SOCK/PID) after use so it doesn't linger.
179-179:ssh -vvvis the right debug hammer — plan to walk it back.Fine for the investigation this PR is opening, but a couple of things worth knowing so this doesn't become permanent noise:
-vvveffectively overrides theLogLevel=ERRORthat remains insideSSHOPTS, so once the root cause is understood you'll want to drop either the-vvvor the staleLogLevel=ERRORto keep the configuration coherent.- Every retry now emits a full
ssh -vvvtrace to the job log (up to 15 of them), which on failure will produce a lot of output to sift through. If you plan to keep verbose output post-investigation, consider guarding it (e.g.SSH_VERBOSE=${SSH_VERBOSE:-}→ssh ${SSH_VERBOSE:+-vvv} ...) so normal runs stay readable.Not blocking — just flagging so the diagnostic mode doesn't silently become the default.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ci-operator/step-registry/trusted-execution-clusters/ref/operator/beaker-kind-provision/trusted-execution-clusters-ref-operator-beaker-kind-provision-commands.sh` at line 179, The ssh call currently forces -vvv debug output ("ssh -vvv ${SSHOPTS[@]} ${BEAKER_USER}@${BEAKER_IP} ...") which overrides the LogLevel=ERROR in SSHOPTS and will spam logs on retries; change it so verbose debugging is opt-in (remove the hardcoded -vvv) and either use a guard like SSH_VERBOSE to expand to -vvv only when set (e.g. ssh ${SSH_VERBOSE:+-vvv} "${SSHOPTS[@]}" "${BEAKER_USER}@${BEAKER_IP}" ...) or remove the stale LogLevel=ERROR from SSHOPTS so behavior is consistent; update the call that references SSHOPTS, BEAKER_USER, and BEAKER_IP accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/beaker-kind-provision/trusted-execution-clusters-ref-operator-beaker-kind-provision-commands.sh`:
- Around line 156-160: The TCP probe using nc should be made defensive: first
test for availability of nc or ncat (e.g., via command -v) and log a clear
message if neither exists; when invoking nc/ncat in the if condition (the block
referencing BEAKER_IP and port 22), capture stderr by redirecting 2>&1 so
diagnostic output appears in logs; if no nc/ncat is found, fall back to a bash
/dev/tcp probe (using the BEAKER_IP variable) and use its exit status for the
same success/fail branch, ensuring the existing log_success/log_warn messages
are preserved and augmented with the captured diagnostic output when available.
---
Nitpick comments:
In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/beaker-kind-provision/trusted-execution-clusters-ref-operator-beaker-kind-provision-commands.sh`:
- Around line 166-170: The ssh-agent/ssh-add lines are redundant because SSHOPTS
already uses -i "${SSH_PKEY}", and they spawn an orphan agent; remove the eval
"$(ssh-agent -s)" and ssh-add "${SSH_PKEY}" calls and instead validate the
private key file referenced by SSH_PKEY (replace the wc -l check) by invoking
ssh-keygen -y -f "${SSH_PKEY}" >/dev/null to ensure the key is parseable; if you
must keep agent behavior, ensure you properly record and kill the agent
(SSH_AUTH_SOCK/PID) after use so it doesn't linger.
- Line 179: The ssh call currently forces -vvv debug output ("ssh -vvv
${SSHOPTS[@]} ${BEAKER_USER}@${BEAKER_IP} ...") which overrides the
LogLevel=ERROR in SSHOPTS and will spam logs on retries; change it so verbose
debugging is opt-in (remove the hardcoded -vvv) and either use a guard like
SSH_VERBOSE to expand to -vvv only when set (e.g. ssh ${SSH_VERBOSE:+-vvv}
"${SSHOPTS[@]}" "${BEAKER_USER}@${BEAKER_IP}" ...) or remove the stale
LogLevel=ERROR from SSHOPTS so behavior is consistent; update the call that
references SSHOPTS, BEAKER_USER, and BEAKER_IP accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5f765fc6-1bc3-461a-9b13-da08d521beee
📒 Files selected for processing (1)
ci-operator/step-registry/trusted-execution-clusters/ref/operator/beaker-kind-provision/trusted-execution-clusters-ref-operator-beaker-kind-provision-commands.sh
| if nc -zv "${BEAKER_IP}" 22; then | ||
| log_success "Beaker machine ${BEAKER_IP} accepts on port 22" | ||
| else | ||
| log_warn "Beaker machine ${BEAKER_IP} does not accept on port 22, trying SSH anyhow" | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm whether `nc` is expected to exist in the image(s) used by this step.
# Look for any image/Dockerfile references or explicit nc installs in the repo.
fd -t f -e Dockerfile
rg -nP '\bnc(at)?\b' -g '!**/vendor/**' -C2Repository: openshift/release
Length of output: 50373
🏁 Script executed:
# Find the metadata for the beaker-kind-provision step
find . -path "*trusted-execution-clusters*ref*operator*beaker-kind-provision*" -name "*.metadata.json" -o -path "*trusted-execution-clusters*ref*operator*beaker-kind-provision*" -name "*.yaml"Repository: openshift/release
Length of output: 392
🏁 Script executed:
# Look for image references or base image specifications in the beaker-kind-provision step directory
find ci-operator/step-registry/trusted-execution-clusters -name "*beaker-kind-provision*" -type f -exec cat {} \;Repository: openshift/release
Length of output: 35311
Guard the nc probe against a missing binary and stderr capture.
The TCP reachability check should be defensive:
ncis not guaranteed in all Prow CI images. If it's absent, thecommand not founderror will silently cause theifcondition to fail, making the log message "does not accept on port 22" misleading when the real issue is a missing tool. Checking forncavailability upfront removes ambiguity.- Both
nc -zvandncat -zvwrite their output to stderr, which gets lost unless redirected. Capturing2>&1ensures diagnostics stay together in logs.
Consider checking for nc availability and redirecting stderr for clarity. If nc is unavailable, use bash's /dev/tcp as a fallback so the probe still runs:
🛠 Suggested fix
-if nc -zv "${BEAKER_IP}" 22; then
- log_success "Beaker machine ${BEAKER_IP} accepts on port 22"
-else
- log_warn "Beaker machine ${BEAKER_IP} does not accept on port 22, trying SSH anyhow"
-fi
+if command -v nc >/dev/null 2>&1; then
+ if nc -zv "${BEAKER_IP}" 22 2>&1; then
+ log_success "Beaker machine ${BEAKER_IP} accepts on port 22"
+ else
+ log_warn "Beaker machine ${BEAKER_IP} does not accept on port 22, trying SSH anyhow"
+ fi
+else
+ if timeout 5 bash -c ">/dev/tcp/${BEAKER_IP}/22" 2>/dev/null; then
+ log_success "Beaker machine ${BEAKER_IP} accepts on port 22 (via /dev/tcp)"
+ else
+ log_warn "Beaker machine ${BEAKER_IP} does not accept on port 22, trying SSH anyhow"
+ fi
+fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if nc -zv "${BEAKER_IP}" 22; then | |
| log_success "Beaker machine ${BEAKER_IP} accepts on port 22" | |
| else | |
| log_warn "Beaker machine ${BEAKER_IP} does not accept on port 22, trying SSH anyhow" | |
| fi | |
| if command -v nc >/dev/null 2>&1; then | |
| if nc -zv "${BEAKER_IP}" 22 2>&1; then | |
| log_success "Beaker machine ${BEAKER_IP} accepts on port 22" | |
| else | |
| log_warn "Beaker machine ${BEAKER_IP} does not accept on port 22, trying SSH anyhow" | |
| fi | |
| else | |
| if timeout 5 bash -c ">/dev/tcp/${BEAKER_IP}/22" 2>/dev/null; then | |
| log_success "Beaker machine ${BEAKER_IP} accepts on port 22 (via /dev/tcp)" | |
| else | |
| log_warn "Beaker machine ${BEAKER_IP} does not accept on port 22, trying SSH anyhow" | |
| fi | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/beaker-kind-provision/trusted-execution-clusters-ref-operator-beaker-kind-provision-commands.sh`
around lines 156 - 160, The TCP probe using nc should be made defensive: first
test for availability of nc or ncat (e.g., via command -v) and log a clear
message if neither exists; when invoking nc/ncat in the if condition (the block
referencing BEAKER_IP and port 22), capture stderr by redirecting 2>&1 so
diagnostic output appears in logs; if no nc/ncat is found, fall back to a bash
/dev/tcp probe (using the BEAKER_IP variable) and use its exit status for the
same success/fail branch, ensuring the existing log_success/log_warn messages
are preserved and augmented with the captured diagnostic output when available.
12221f9 to
6d0ed49
Compare
|
/pj-rehearse |
|
@Jakob-Naucke: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
- ssh-add the private key - Verbose SSH connection on first attempt - no password authentication Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
6d0ed49 to
1195e53
Compare
|
/pj-rehearse |
|
@Jakob-Naucke: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
[REHEARSALNOTIFIER]
Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/pj-rehearse |
|
@Jakob-Naucke: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
Was able to connect in an upstream PR, i.e. without the changes proposed here. Closing. @alicefr lmk if you urgently want to change something here still |
|
need more debugging after all, sorry for the noise |
|
/pj-rehearse |
|
@Jakob-Naucke: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@Jakob-Naucke: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Can ping but SSH times out, despite SSH being reachable from other hosts. Investigate.
Summary by CodeRabbit