Skip to content

LCORE-1472: Add e2e retriy capability#1500

Merged
radofuchs merged 4 commits into
lightspeed-core:mainfrom
radofuchs:LCORE-1472_e2e_retries
Apr 14, 2026
Merged

LCORE-1472: Add e2e retriy capability#1500
radofuchs merged 4 commits into
lightspeed-core:mainfrom
radofuchs:LCORE-1472_e2e_retries

Conversation

@radofuchs
Copy link
Copy Markdown
Contributor

@radofuchs radofuchs commented Apr 14, 2026

Description

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Release Notes

  • Tests
    • Enhanced test reliability with automatic retry support for identified unstable tests
    • Improved test assertion accuracy with case-insensitive matching to reduce false negatives

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 14, 2026

Warning

Rate limit exceeded

@radofuchs has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 46 minutes and 25 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 46 minutes and 25 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 49a24790-1da6-4227-ac58-dd522ae0cc29

📥 Commits

Reviewing files that changed from the base of the PR and between 1c6ed3d and c3b8193.

📒 Files selected for processing (6)
  • tests/e2e/features/environment.py
  • tests/e2e/features/mcp.feature
  • tests/e2e/features/query.feature
  • tests/e2e/features/responses.feature
  • tests/e2e/features/responses_streaming.feature
  • tests/e2e/features/streaming_query.feature

Walkthrough

The changes introduce opt-in auto-retry infrastructure for flaky e2e tests and apply @flaky tags across multiple test scenarios. Additionally, response assertion logic is refactored to perform case-insensitive substring matching using a new helper function.

Changes

Cohort / File(s) Summary
Flaky Retry Infrastructure
tests/e2e/features/environment.py
Implemented scenario auto-retry support in before_feature hook; imports patch_scenario_with_autoretry, computes max_flaky retry cap from environment configuration, and patches scenarios tagged @flaky to retry on failures.
Flaky Test Tags
tests/e2e/features/mcp.feature, tests/e2e/features/query.feature, tests/e2e/features/responses.feature, tests/e2e/features/responses_streaming.feature, tests/e2e/features/streaming_query.feature
Added @flaky tags to success-path e2e scenarios across multiple feature files, including MCP authentication flows, LLM response tests, streaming query tests, and conversation management tests. One minor formatting change (blank line removal) in streaming_query.feature.
Response Assertion Helpers
tests/e2e/features/steps/llm_query_response.py
Introduced _response_contains_fragment() helper function for case-insensitive substring matching; refactored check_fragments_in_responses_output_text(), check_fragments_in_response(), and check_streamed_fragments_in_response() to use the helper and updated assertion messages accordingly.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title contains a typo ('retriy' instead of 'retry') and is vague about scope; while it references the core intent (retry capability), the typo and lack of specificity about which e2e tests are affected reduce clarity. Correct the typo to 'Add e2e retry capability' for accuracy and clarity.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/e2e/features/environment.py`:
- Around line 38-39: The _E2E_FEATURE_PERF_START_ATTR constant and timing doc
are stale because no timestamp is recorded or used; either remove the
constant/docs or wire the timing by setting
feature._lightspeed_e2e_feature_perf_start in the Behave hook before_feature and
computing/consuming the duration in after_feature (use time.time() or
datetime.now()), then store or log the computed duration (e.g., on feature or
context) so the declared feature-duration tracking is actually recorded and
consumed; update or remove any related docstrings and references accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7c1d5cb2-4f4e-4894-83eb-952383ac648d

📥 Commits

Reviewing files that changed from the base of the PR and between 6acfc16 and 1c6ed3d.

📒 Files selected for processing (7)
  • tests/e2e/features/environment.py
  • tests/e2e/features/mcp.feature
  • tests/e2e/features/query.feature
  • tests/e2e/features/responses.feature
  • tests/e2e/features/responses_streaming.feature
  • tests/e2e/features/steps/llm_query_response.py
  • tests/e2e/features/streaming_query.feature
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: build-pr
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: list_outdated_dependencies
  • GitHub Check: integration_tests (3.13)
  • GitHub Check: Pylinter
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E: library mode / ci
  • GitHub Check: E2E: server mode / ci
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (2)
tests/e2e/features/**/*.feature

📄 CodeRabbit inference engine (AGENTS.md)

Use behave (BDD) framework with Gherkin feature files for end-to-end tests

Files:

  • tests/e2e/features/query.feature
  • tests/e2e/features/responses.feature
  • tests/e2e/features/streaming_query.feature
  • tests/e2e/features/responses_streaming.feature
  • tests/e2e/features/mcp.feature
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async unit tests

Files:

  • tests/e2e/features/steps/llm_query_response.py
  • tests/e2e/features/environment.py
🧠 Learnings (9)
📓 Common learnings
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to tests/e2e/features/**/*.feature : Use behave (BDD) framework with Gherkin feature files for end-to-end tests
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to tests/e2e/features/**/*.feature : Use behave (BDD) framework with Gherkin feature files for end-to-end tests

Applied to files:

  • tests/e2e/features/query.feature
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Place E2E step definitions in `tests/e2e/features/steps/`

Applied to files:

  • tests/e2e/features/query.feature
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Maintain E2E test list in `tests/e2e/test_list.txt`

Applied to files:

  • tests/e2e/features/query.feature
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • tests/e2e/features/query.feature
  • tests/e2e/features/mcp.feature
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.

Applied to files:

  • tests/e2e/features/query.feature
📚 Learning: 2026-04-07T09:20:26.590Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1467
File: tests/e2e/features/steps/common.py:36-49
Timestamp: 2026-04-07T09:20:26.590Z
Learning: For Behave-based Python tests, rely on Behave’s Context layered stack for attribute lifecycle: Behave pushes a new Context layer when entering feature scope (before_feature) and again for scenario scope (before_scenario). Attributes assigned inside given/when/then steps live on the current scenario layer and are automatically removed when the scenario ends. As a result, step-set attributes should not be expected to persist across scenarios or features, and manual cleanup in after_scenario/after_feature is generally unnecessary for attributes set in step functions. Only perform manual cleanup for attributes that you set explicitly in before_feature/before_scenario, since those live on the respective feature/scenario layers.

Applied to files:

  • tests/e2e/features/steps/llm_query_response.py
  • tests/e2e/features/environment.py
📚 Learning: 2026-04-13T13:39:54.963Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:206-211
Timestamp: 2026-04-13T13:39:54.963Z
Learning: In lightspeed-stack E2E tests under tests/e2e/features, it is intentional to set context.feature_config inside Background/step functions (scenario-scoped Behave layer). The environment.py after_scenario restore logic should only restore configuration when context.scenario_lightspeed_override_active is True; this flag is set by configure_service only when a real config switch occurs (so restore does not run for scenarios without a switch). Additionally, steps/common.py’s module-level _active_lightspeed_stack_config_basename is used to prevent re-applying the same config across subsequent scenarios, ensuring scenario_lightspeed_override_active stays False after the first apply. Therefore, reviewers should not “fix” this flow as if feature_config were incorrectly scoped or if after_scenario restoration is missing—config switching and restoration are meant to happen exactly once per actual switch, not redundantly per scenario.

Applied to files:

  • tests/e2e/features/steps/llm_query_response.py
  • tests/e2e/features/environment.py
📚 Learning: 2026-02-19T10:06:58.708Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1181
File: tests/e2e-prow/rhoai/manifests/lightspeed/mock-jwks.yaml:32-34
Timestamp: 2026-02-19T10:06:58.708Z
Learning: In tests/e2e-prow/rhoai/, ConfigMaps like mock-jwks-script and mcp-mock-server-script are created dynamically by the pipeline.sh deployment script using `oc create configmap` commands, rather than being defined as static ConfigMap resources in the manifest YAML files.

Applied to files:

  • tests/e2e/features/mcp.feature
🔇 Additional comments (7)
tests/e2e/features/steps/llm_query_response.py (1)

28-30: Good consistency improvement for fragment assertions.

The case-insensitive helper is applied consistently across all three fragment-checking steps, which should reduce brittle failures caused only by response capitalization while preserving clear failure output.

Also applies to: 246-247, 259-262, 272-273, 298-301, 311-312, 322-325

tests/e2e/features/responses.feature (1)

337-337: @flaky tagging is correctly scoped.

Nice targeted application of retry behavior to the known unstable scenario without changing its test intent.

tests/e2e/features/query.feature (1)

13-13: Flaky annotations look appropriate for these query scenarios.

The change is focused and keeps scenario behavior unchanged while enabling retry only where needed.

Also applies to: 26-26

tests/e2e/features/responses_streaming.feature (1)

328-328: Good targeted retry enablement for streaming conversation-flow scenarios.

These @flaky tags are applied narrowly to the high-variance paths and preserve existing assertions.

Also applies to: 359-359, 381-381

tests/e2e/features/streaming_query.feature (1)

22-22: Scoped @flaky updates look good.

Retry is enabled only for the unstable streaming-query prompt-variant scenarios, with no behavioral drift.

Also applies to: 36-36

tests/e2e/features/environment.py (1)

374-382: Flaky autoretry wiring is clean and well-guarded.

The env override parsing with safe fallback plus targeted patching of @flaky scenarios is a solid implementation.

tests/e2e/features/mcp.feature (1)

21-21: @flaky tagging across MCP success-path scenarios is consistent and appropriate.

This is a focused reliability change that preserves scenario logic while enabling retry where intermittent infrastructure variance is expected.

Also applies to: 38-38, 130-130, 147-147, 241-241, 261-261, 291-291, 308-308, 409-409, 429-429

Comment thread tests/e2e/features/environment.py
@radofuchs radofuchs merged commit fe35459 into lightspeed-core:main Apr 14, 2026
26 of 29 checks passed
Anxhela21 pushed a commit to Anxhela21/lightspeed-stack that referenced this pull request Apr 27, 2026
* add retry functionality to flaky e2e tests

---------

Co-authored-by: Radovan Fuchs <rfuchs@rfuchs-thinkpadp1gen7.tpb.csb>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants