Skip to content

Conversation

@radofuchs
Copy link
Contributor

@radofuchs radofuchs commented Sep 3, 2025

Description

auth e2e tests hardening by adding health check after docker restart

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Tests
    • Improved end-to-end test stability by waiting for container health instead of fixed delays.
    • Added container-aware utilities with optional cleanup to streamline configuration switching during test runs.
    • Introduced retry logic and timeouts to reduce flakiness when containers start slowly.
    • Enhanced diagnostics with clearer messages on health check failures to aid investigation.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 3, 2025

Walkthrough

Adds wait_for_container_health(container_name, max_attempts=3) to E2E test utilities and updates switch_config_and_restart to accept container_name and cleanup; after restart it now polls Docker health until "healthy" (with timeouts/retries and error handling) instead of a fixed sleep.

Changes

Cohort / File(s) Summary
E2E Test Utilities
tests/e2e/utils/utils.py
Added wait_for_container_health(container_name: str, max_attempts: int = 3) -> None which polls docker inspect with a 10s subprocess timeout, retries with 5s sleeps, and handles CalledProcessError/TimeoutExpired while logging attempts. Updated switch_config_and_restart(original_file: str, replacement_file: str, container_name: str, cleanup: bool = False) -> str to accept container_name and cleanup and to call wait_for_container_health after restarting the container instead of sleeping.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Tester
  participant Switch as switch_config_and_restart
  participant Docker as Docker Engine
  participant Health as wait_for_container_health

  Tester->>Switch: switch_config_and_restart(orig, repl, container_name, cleanup?)
  Switch->>Docker: Copy replacement config & restart container
  Switch->>Health: wait_for_container_health(container_name, max_attempts=3)
  loop up to 3 attempts
    Health->>Docker: docker inspect --format {{.State.Health.Status}} (10s timeout)
    alt status == "healthy"
      Health-->>Switch: healthy → return
    else status != "healthy" or error/timeout
      Health-->>Health: log attempt, sleep 5s, retry
    end
  end
  Note over Health,Switch: After final failed attempt, logs timeout/error and returns
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • tisnik

Poem

hop hop — I wait for the green,
I poke the shell, I listen keen.
A tiny timeout, a patient sniff,
health checks pass — the tests can whiff.
Config swapped, the pipeline's seen. 🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/e2e/utils/utils.py (2)

68-74: Fix backup semantics; align docstring and logic. Always create backup; let cleanup only delete it.

Skipping backup creation when cleanup=True risks losing the ability to restore and contradicts the docstring.

 def switch_config_and_restart(
     original_file: str,
     replacement_file: str,
     container_name: str,
     cleanup: bool = False,
 ) -> str:
@@
-        cleanup: If True, remove the backup file after restoration (default: False)
+        cleanup: If True, remove the backup file at the end of this call (default: False)
@@
-    if not cleanup and not os.path.exists(backup_file):
+    if not os.path.exists(backup_file):
         try:
             shutil.copy(original_file, backup_file)

Also applies to: 80-81, 87-93


100-111: Add a timeout to docker restart and separate exception handling.

Without a timeout, tests can hang indefinitely; combine with the new health wait for robust readiness.

     # Restart container
     try:
         subprocess.run(
             ["docker", "restart", container_name],
             capture_output=True,
             text=True,
-            check=True,
+            check=True,
+            timeout=60,
         )
-    except (subprocess.CalledProcessError, subprocess.TimeoutExpired) as e:
-        print(f"Failed to restart container {container_name}: {e.stderr}")
+    except subprocess.CalledProcessError as e:
+        print(f"Failed to restart container {container_name}: {e.stderr}")
+        raise
+    except subprocess.TimeoutExpired as e:
+        print(f"Timed out restarting container {container_name} after {e.timeout}s")
         raise
@@
-    # Wait for container to be healthy
+    # Wait for container to be healthy (raises on timeout)
     wait_for_container_health(container_name)

Also applies to: 112-114

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ddbe64d and e277521.

📒 Files selected for processing (1)
  • tests/e2e/utils/utils.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests

@radofuchs radofuchs force-pushed the Auth-w2w-test-hardening branch from dfcbed0 to 0a946cd Compare September 3, 2025 09:12
Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice one

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/e2e/utils/utils.py (1)

70-75: Add missing container_name (and optional cleanup) to all switch_config_and_restart calls

The invocations in tests/e2e/features/environment.py at lines 90 and 100 are still using the old two-arg signature and must be updated to include the required container_name parameter (and cleanup if needed).

♻️ Duplicate comments (1)
tests/e2e/utils/utils.py (1)

35-68: Fail fast on unhealthy/timeout; fix double-sleep; support containers without HEALTHCHECK.

Current loop swallows failures, never raises on timeout, and sleeps twice per attempt (once inside try, once after), making timing/logs inaccurate. Tests may proceed while the container isn’t ready.

Apply this hardened rewrite:

-def wait_for_container_health(container_name: str, max_attempts: int = 3) -> None:
-    """Wait for container to be healthy."""
-    for attempt in range(max_attempts):
-        try:
-            result = subprocess.run(
-                [
-                    "docker",
-                    "inspect",
-                    "--format={{.State.Health.Status}}",
-                    container_name,
-                ],
-                capture_output=True,
-                text=True,
-                check=True,
-                timeout=10,
-            )
-            if result.stdout.strip() == "healthy":
-                break
-            else:
-                if attempt < max_attempts - 1:
-                    time.sleep(5)
-                else:
-                    print(
-                        f"{container_name} not healthy after {max_attempts * 5} seconds"
-                    )
-        except (subprocess.CalledProcessError, subprocess.TimeoutExpired):
-            pass
-
-        if attempt < max_attempts - 1:
-            print(f"⏱ Attempt {attempt + 1}/{max_attempts} - waiting...")
-            time.sleep(5)
-        else:
-            print(f"Could not check health status for {container_name}")
+def wait_for_container_health(
+    container_name: str,
+    max_attempts: int = 6,
+    interval_sec: int = 5,
+    inspect_timeout_sec: int = 10,
+) -> None:
+    """Wait for container to be healthy (or running if no HEALTHCHECK); raise on timeout."""
+    start = time.monotonic()
+    last_status = "unknown"
+    for attempt in range(1, max_attempts + 1):
+        try:
+            result = subprocess.run(
+                [
+                    "docker",
+                    "inspect",
+                    "--format={{if .State.Health}}{{.State.Health.Status}}{{else}}no-healthcheck{{end}}",
+                    container_name,
+                ],
+                capture_output=True,
+                text=True,
+                check=True,
+                timeout=inspect_timeout_sec,
+            )
+            status = result.stdout.strip()
+            last_status = status
+            if status == "healthy":
+                return
+            if status == "no-healthcheck":
+                # Fallback: consider "running" as ready when no HealthCheck is defined.
+                state = subprocess.run(
+                    ["docker", "inspect", "--format={{.State.Status}}", container_name],
+                    capture_output=True,
+                    text=True,
+                    check=True,
+                    timeout=inspect_timeout_sec,
+                )
+                if state.stdout.strip() == "running":
+                    return
+        except (subprocess.CalledProcessError, subprocess.TimeoutExpired):
+            # ignore and retry
+            pass
+
+        if attempt < max_attempts:
+            print(f"⏱ Attempt {attempt}/{max_attempts} - waiting...")
+            time.sleep(interval_sec)
+        else:
+            elapsed = int(time.monotonic() - start)
+            raise TimeoutError(f"{container_name} not ready after {elapsed}s (last status: {last_status})")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e277521 and 0a946cd.

📒 Files selected for processing (1)
  • tests/e2e/utils/utils.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests

Comment on lines +114 to 116
# Wait for container to be healthy
wait_for_container_health(container_name)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Surface restart failures with a timeout and clearer error reporting.

TimeoutExpired won’t fire without a timeout; also bubble up a concise message.

     # Restart container
     try:
         subprocess.run(
             ["docker", "restart", container_name],
             capture_output=True,
             text=True,
             check=True,
+            timeout=60,
         )
-    except (subprocess.CalledProcessError, subprocess.TimeoutExpired) as e:
-        print(f"Failed to restart container {container_name}: {e.stderr}")
+    except (subprocess.CalledProcessError, subprocess.TimeoutExpired) as e:
+        err = getattr(e, "stderr", None) or str(e)
+        print(f"Failed to restart container {container_name}: {err}")
         raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Wait for container to be healthy
wait_for_container_health(container_name)
# Restart container
try:
subprocess.run(
["docker", "restart", container_name],
capture_output=True,
text=True,
check=True,
timeout=60,
)
except (subprocess.CalledProcessError, subprocess.TimeoutExpired) as e:
err = getattr(e, "stderr", None) or str(e)
print(f"Failed to restart container {container_name}: {err}")
raise
🤖 Prompt for AI Agents
In tests/e2e/utils/utils.py around lines 114 to 116, the call to
wait_for_container_health(container_name) currently can hang indefinitely and
any TimeoutExpired won't be raised because no timeout is provided; update the
call to pass a reasonable timeout (e.g., timeout_seconds or a constant) and wrap
the call in a try/except that catches TimeoutExpired (and optionally
subprocess.TimeoutExpired) and re-raises a concise, informative exception (or
raise RuntimeError) that includes the container name and that the container
failed to become healthy within the timeout; ensure the timeout value is
configurable or clearly documented.

@tisnik tisnik merged commit fda810e into lightspeed-core:main Sep 3, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants