Skip to content

Conversation

@kaovilai
Copy link
Member

@kaovilai kaovilai commented Nov 6, 2025

Summary

This PR improves the reliability and stability of the MongoDB persistent test application used in E2E tests across all deployment variants (standard, CSI, and block storage).

Changes Made

1. Image Version Pinning

  • Changed: MongoDB image from latest to 7.0
  • Why: Using latest can introduce unexpected breaking changes and make tests non-deterministic. Pinning to MongoDB 7.0 ensures consistent behavior across test runs.

2. Resource Configuration Improvements

  • Changed: Added explicit resource requests (512Mi) alongside limits (1Gi)
  • Why:
    • Kubernetes best practice to define both requests and limits
    • Prevents pod eviction under memory pressure
    • Ensures proper QoS class assignment for better scheduling
    • Gives MongoDB more headroom (1Gi limit vs 512Mi request) for stability

3. Enhanced Health Probes

  • Readiness Probe (new):

    • Uses mongosh with database ping command
    • Initial delay: 30s, Period: 10s
    • Ensures pod only receives traffic when MongoDB is ready to serve requests
  • Liveness Probe (improved):

    • Changed from TCP socket check to exec-based mongosh ping
    • Initial delay: 60s, Period: 30s
    • More accurate health detection (validates MongoDB process is functional, not just port is open)
  • Startup Probe (optimized):

    • Reduced period from 30s to 10s for faster startup detection
    • Reduced failure threshold from 40 to 12 (still allows 2 minutes for startup)
    • More responsive to actual MongoDB readiness

Why These Changes Were Made

The MongoDB test pods were experiencing stability issues in CI environments, particularly:

  • Pods being marked ready before MongoDB was actually ready to serve requests
  • OOMKilled events due to insufficient memory limits
  • False negative health checks using TCP probes that couldn't detect MongoDB-specific issues
  • Slow startup detection causing delays in test execution

How to Test

  1. Deploy the MongoDB test application:

    kubectl apply -f tests/e2e/sample-applications/mongo-persistent/mongo-persistent.yaml
    # Or for CSI variant:
    kubectl apply -f tests/e2e/sample-applications/mongo-persistent/mongo-persistent-csi.yaml
    # Or for block storage variant:
    kubectl apply -f tests/e2e/sample-applications/mongo-persistent/mongo-persistent-block.yaml
  2. Verify pod startup and readiness:

    kubectl get pods -n mongo-persistent -w
  3. Check probe status and resource usage:

    kubectl describe pod -n mongo-persistent -l app=todolist
    kubectl top pod -n mongo-persistent
  4. Run E2E tests that use this application to verify backup/restore functionality still works correctly.

Impact

  • Test Reliability: Improved stability of MongoDB-based E2E tests
  • CI Success Rate: Should reduce flaky test failures related to MongoDB pod issues
  • No Breaking Changes: Changes are internal to test fixtures only

Note

Responses generated with Claude

@coderabbitai
Copy link

coderabbitai bot commented Nov 6, 2025

Walkthrough

Updated MongoDB container images to version 7.0 across three sample application configurations. Added memory resource requests (512Mi) and limits (1Gi). Replaced TCP socket–based probes with exec-based mongosh ping commands for readiness and liveness checks. Adjusted probe timing parameters and failure thresholds.

Changes

Cohort / File(s) Summary
MongoDB Persistent Sample Application Probe and Resource Updates
tests/e2e/sample-applications/mongo-persistent/mongo-persistent.yaml, tests/e2e/sample-applications/mongo-persistent/mongo-persistent-block.yaml, tests/e2e/sample-applications/mongo-persistent/mongo-persistent-csi.yaml
Updated MongoDB image from latest to 7.0, added memory resource requests (512Mi) and limits (1Gi), replaced tcpSocket liveness probes with exec-based readinessProbe and livenessProbe using mongosh ping commands, updated startupProbe with exec command, and adjusted probe timing parameters (initialDelaySeconds, periodSeconds, timeoutSeconds, failureThreshold)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

  • Probe command validation: Verify that exec-based mongosh ping commands are syntactically correct and will execute successfully in the container environment
  • Timing parameter appropriateness: Confirm that adjusted probe intervals and failure thresholds are suitable for each deployment type (block, csi, standard) and won't cause premature pod restarts
  • Resource limit suitability: Validate that 512Mi memory request and 1Gi limit are appropriate for MongoDB 7.0 in each deployment scenario
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from eemcmullan and sseago November 6, 2025 22:04
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 6, 2025
Remove local must-gather directory and build process in favor of using the
external quay.io/konveyor/oadp-must-gather:latest image via oc adm must-gather.
This eliminates architecture mismatch issues and keeps must-gather code in its
dedicated repository.

Changes:
- Updated RunMustGather() in tests/e2e/lib/apps.go to use oc adm must-gather
- Added MUST_GATHER_IMAGE env var (defaults to quay.io/konveyor/oadp-must-gather:latest)
- Removed build-must-gather target from Makefile
- Removed entire must-gather/ directory (3,174 lines deleted)
- Updated documentation in TESTING.md

The SKIP_MUST_GATHER flag is preserved for skipping must-gather collection.
Version-specific images can be used by setting MUST_GATHER_IMAGE env var.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Revert "refactor(e2e): migrate to external oadp-must-gather container image"

This reverts commit 09a2a49.

Revert "fix(e2e): derive must-gather directory pattern from image name"

This reverts commit 2ae6d45.

fix(e2e): update mongo image version and resource limits in mongo-persistent deployment

fix(e2e): update mongo image version and resource limits in mongo-persistent deployment

fix(e2e): derive must-gather directory pattern from image name

The directory pattern was hardcoded to 'quay-io-konveyor-oadp-must-gather-*'
which breaks when using custom images via MUST_GATHER_IMAGE env var.

Now dynamically derives the pattern from the actual image name by replacing
registry separators (. / :) with hyphens to match oc adm must-gather's
directory naming convention.

Examples:
- quay.io/konveyor/oadp-must-gather:latest -> quay-io-konveyor-oadp-must-gather-latest-*
- docker.io/myuser/custom:v1 -> docker-io-myuser-custom-v1-*

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@kaovilai kaovilai changed the title mongo ci stable something fix(e2e): improve MongoDB test application stability and resource configuration Nov 6, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 5048a8f and c63e497.

📒 Files selected for processing (3)
  • tests/e2e/sample-applications/mongo-persistent/mongo-persistent-block.yaml (4 hunks)
  • tests/e2e/sample-applications/mongo-persistent/mongo-persistent-csi.yaml (2 hunks)
  • tests/e2e/sample-applications/mongo-persistent/mongo-persistent.yaml (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • tests/e2e/sample-applications/mongo-persistent/mongo-persistent-csi.yaml
  • tests/e2e/sample-applications/mongo-persistent/mongo-persistent.yaml
  • tests/e2e/sample-applications/mongo-persistent/mongo-persistent-block.yaml
🔇 Additional comments (1)
tests/e2e/sample-applications/mongo-persistent/mongo-persistent.yaml (1)

98-102: Resource configuration is appropriate for sample applications.

The memory request (512Mi) and limit (1Gi) are reasonable defaults for a sample MongoDB deployment.

Comment on lines +136 to +155
readinessProbe:
exec:
command:
- /bin/bash
- -c
- "mongosh --eval 'db.runCommand(\"ping\")' --quiet"
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
livenessProbe:
exec:
command:
- /bin/bash
- -c
- "mongosh --eval 'db.runCommand(\"ping\")' --quiet"
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Authentication mismatch between startup and readiness/liveness probes.

Same authentication inconsistency as the other two files: readinessProbe and livenessProbe lack credentials while startupProbe authenticates, causing probe failures once MongoDB enforces authentication.

Apply the authentication fix:

               readinessProbe:
                 exec:
                   command:
                   - /bin/bash
                   - -c
-                  - "mongosh --eval 'db.runCommand(\"ping\")' --quiet"
+                  - "mongosh admin --authenticationDatabase admin -u \"$MONGO_INITDB_ROOT_USERNAME\" -p \"$MONGO_INITDB_ROOT_PASSWORD\" --eval 'db.runCommand(\"ping\")' --quiet"
                 initialDelaySeconds: 30
                 periodSeconds: 10
                 timeoutSeconds: 5
                 failureThreshold: 3
               livenessProbe:
                 exec:
                   command:
                   - /bin/bash
                   - -c
-                  - "mongosh --eval 'db.runCommand(\"ping\")' --quiet"
+                  - "mongosh admin --authenticationDatabase admin -u \"$MONGO_INITDB_ROOT_USERNAME\" -p \"$MONGO_INITDB_ROOT_PASSWORD\" --eval 'db.runCommand(\"ping\")' --quiet"
                 initialDelaySeconds: 60
                 periodSeconds: 30
                 timeoutSeconds: 10
                 failureThreshold: 3
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
readinessProbe:
exec:
command:
- /bin/bash
- -c
- "mongosh --eval 'db.runCommand(\"ping\")' --quiet"
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
livenessProbe:
exec:
command:
- /bin/bash
- -c
- "mongosh --eval 'db.runCommand(\"ping\")' --quiet"
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
readinessProbe:
exec:
command:
- /bin/bash
- -c
- "mongosh admin --authenticationDatabase admin -u \"$MONGO_INITDB_ROOT_USERNAME\" -p \"$MONGO_INITDB_ROOT_PASSWORD\" --eval 'db.runCommand(\"ping\")' --quiet"
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
livenessProbe:
exec:
command:
- /bin/bash
- -c
- "mongosh admin --authenticationDatabase admin -u \"$MONGO_INITDB_ROOT_USERNAME\" -p \"$MONGO_INITDB_ROOT_PASSWORD\" --eval 'db.runCommand(\"ping\")' --quiet"
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
🤖 Prompt for AI Agents
In tests/e2e/sample-applications/mongo-persistent/mongo-persistent-block.yaml
around lines 136 to 155, the readinessProbe and livenessProbe call mongosh
without authentication while startupProbe uses credentials, causing failures
when MongoDB requires auth; update both probes to use the same authentication
parameters as startupProbe (e.g., add the same --username, --password and
--authenticationDatabase flags or use the identical authenticated connection
string used by startupProbe) so all probes authenticate consistently and succeed
once auth is enforced.

Comment on lines +93 to +112
readinessProbe:
exec:
command:
- /bin/bash
- -c
- "mongosh --eval 'db.runCommand(\"ping\")' --quiet"
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
livenessProbe:
exec:
command:
- /bin/bash
- -c
- "mongosh --eval 'db.runCommand(\"ping\")' --quiet"
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Authentication mismatch between startup and readiness/liveness probes.

Same issue as in mongo-persistent.yaml: readinessProbe and livenessProbe lack credentials while startupProbe authenticates. This will cause readiness/liveness probe failures once MongoDB enforces authentication.

Apply the same fix as in mongo-persistent.yaml to add authentication:

             readinessProbe:
               exec:
                 command:
                 - /bin/bash
                 - -c
-                - "mongosh --eval 'db.runCommand(\"ping\")' --quiet"
+                - "mongosh admin --authenticationDatabase admin -u \"$MONGO_INITDB_ROOT_USERNAME\" -p \"$MONGO_INITDB_ROOT_PASSWORD\" --eval 'db.runCommand(\"ping\")' --quiet"
               initialDelaySeconds: 30
               periodSeconds: 10
               timeoutSeconds: 5
               failureThreshold: 3
             livenessProbe:
               exec:
                 command:
                 - /bin/bash
                 - -c
-                - "mongosh --eval 'db.runCommand(\"ping\")' --quiet"
+                - "mongosh admin --authenticationDatabase admin -u \"$MONGO_INITDB_ROOT_USERNAME\" -p \"$MONGO_INITDB_ROOT_PASSWORD\" --eval 'db.runCommand(\"ping\")' --quiet"
               initialDelaySeconds: 60
               periodSeconds: 30
               timeoutSeconds: 10
               failureThreshold: 3
🤖 Prompt for AI Agents
In tests/e2e/sample-applications/mongo-persistent/mongo-persistent-csi.yaml
around lines 93 to 112, the readinessProbe and livenessProbe exec commands call
mongosh without credentials while the startupProbe uses authentication; update
both probes to use the same authentication options as the startupProbe (e.g.,
add --username, --password and --authenticationDatabase or switch to the same
authenticated connection string used by startupProbe) so the probes can
authenticate once MongoDB enforces auth; ensure shell quoting/escaping matches
the startupProbe invocation.

Comment on lines +106 to +125
readinessProbe:
exec:
command:
- /bin/bash
- -c
- "mongosh --eval 'db.runCommand(\"ping\")' --quiet"
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
livenessProbe:
exec:
command:
- /bin/bash
- -c
- "mongosh --eval 'db.runCommand(\"ping\")' --quiet"
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Authentication mismatch between startup and readiness/liveness probes.

The readinessProbe and livenessProbe execute mongosh without credentials (lines 106-115, 116-125), while the startupProbe (line 127+) authenticates using MONGO_INITDB_ROOT_USERNAME and MONGO_INITDB_ROOT_PASSWORD. Since MongoDB is initialized with root credentials via environment variables, the readiness and liveness probes will fail with authentication errors, preventing the pod from becoming ready even after startup succeeds.

Apply this diff to add authentication to both probes:

             readinessProbe:
               exec:
                 command:
                 - /bin/bash
                 - -c
-                - "mongosh --eval 'db.runCommand(\"ping\")' --quiet"
+                - "mongosh admin --authenticationDatabase admin -u \"$MONGO_INITDB_ROOT_USERNAME\" -p \"$MONGO_INITDB_ROOT_PASSWORD\" --eval 'db.runCommand(\"ping\")' --quiet"
               initialDelaySeconds: 30
               periodSeconds: 10
               timeoutSeconds: 5
               failureThreshold: 3
             livenessProbe:
               exec:
                 command:
                 - /bin/bash
                 - -c
-                - "mongosh --eval 'db.runCommand(\"ping\")' --quiet"
+                - "mongosh admin --authenticationDatabase admin -u \"$MONGO_INITDB_ROOT_USERNAME\" -p \"$MONGO_INITDB_ROOT_PASSWORD\" --eval 'db.runCommand(\"ping\")' --quiet"
               initialDelaySeconds: 60
               periodSeconds: 30
               timeoutSeconds: 10
               failureThreshold: 3
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
readinessProbe:
exec:
command:
- /bin/bash
- -c
- "mongosh --eval 'db.runCommand(\"ping\")' --quiet"
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
livenessProbe:
exec:
command:
- /bin/bash
- -c
- "mongosh --eval 'db.runCommand(\"ping\")' --quiet"
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
readinessProbe:
exec:
command:
- /bin/bash
- -c
- "mongosh admin --authenticationDatabase admin -u \"$MONGO_INITDB_ROOT_USERNAME\" -p \"$MONGO_INITDB_ROOT_PASSWORD\" --eval 'db.runCommand(\"ping\")' --quiet"
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
livenessProbe:
exec:
command:
- /bin/bash
- -c
- "mongosh admin --authenticationDatabase admin -u \"$MONGO_INITDB_ROOT_USERNAME\" -p \"$MONGO_INITDB_ROOT_PASSWORD\" --eval 'db.runCommand(\"ping\")' --quiet"
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3

@kaovilai
Copy link
Member Author

kaovilai commented Nov 7, 2025

/retest

@kaovilai
Copy link
Member Author

kaovilai commented Nov 7, 2025

/test 4.19-e2e-test-aws

@kaovilai
Copy link
Member Author

kaovilai commented Nov 7, 2025

/retest

@openshift-ci
Copy link

openshift-ci bot commented Nov 7, 2025

@kaovilai: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@weshayutin
Copy link
Contributor

@mpryc @kaovilai @shubham-pampattiwar ha.. this worked :) health checks + pinning mongo to a more stable version.

@kaovilai
Copy link
Member Author

kaovilai commented Nov 7, 2025

/retest

ai-retester: The test Mongo application CSI via CLI timed out after 540 seconds because the todolist pod never reached a succeeded phase. This indicated a problem with the application deployment or interaction in the test setup, causing the e2e tests to fail.

The e2e-test-aws-e2e step failed because the Mongo application CSI test timed out after 540 seconds. The 'todolist' container in mongo pod never succeeded, resulting in a failure.

The test Mongo application CSI timed out because a pod failed to start, specifically the todolist container was stuck in PodInitializing. This suggests a problem with application deployment or resource availability within the test environment.

Copy link
Contributor

@weshayutin weshayutin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woot!

@weshayutin
Copy link
Contributor

/LGTM

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 7, 2025
@weshayutin
Copy link
Contributor

/retest

ai-retester: The test Mongo application CSI via CLI timed out after 540 seconds because the todolist pod never reached a succeeded phase. This indicated a problem with the application deployment or interaction in the test setup, causing the e2e tests to fail.

The e2e-test-aws-e2e step failed because the Mongo application CSI test timed out after 540 seconds. The 'todolist' container in mongo pod never succeeded, resulting in a failure.

The test Mongo application CSI timed out because a pod failed to start, specifically the todolist container was stuck in PodInitializing. This suggests a problem with application deployment or resource availability within the test environment.

@kaovilai PFFT AI

Copy link
Contributor

@mpryc mpryc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci
Copy link

openshift-ci bot commented Nov 7, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kaovilai, mpryc, weshayutin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 9feb185 into openshift:oadp-dev Nov 7, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants