LCORE-1472: Add e2e retriy capability by radofuchs · Pull Request #1500 · lightspeed-core/lightspeed-stack

radofuchs · 2026-04-14T11:07:41Z

Description

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Release Notes

Tests
- Enhanced test reliability with automatic retry support for identified unstable tests
- Improved test assertion accuracy with case-insensitive matching to reduce false negatives

coderabbitai · 2026-04-14T11:07:56Z

Warning

Rate limit exceeded

@radofuchs has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 46 minutes and 25 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 46 minutes and 25 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 49a24790-1da6-4227-ac58-dd522ae0cc29

📥 Commits

Reviewing files that changed from the base of the PR and between 1c6ed3d and c3b8193.

📒 Files selected for processing (6)

tests/e2e/features/environment.py
tests/e2e/features/mcp.feature
tests/e2e/features/query.feature
tests/e2e/features/responses.feature
tests/e2e/features/responses_streaming.feature
tests/e2e/features/streaming_query.feature

Walkthrough

The changes introduce opt-in auto-retry infrastructure for flaky e2e tests and apply @flaky tags across multiple test scenarios. Additionally, response assertion logic is refactored to perform case-insensitive substring matching using a new helper function.

Changes

Cohort / File(s)	Summary
Flaky Retry Infrastructure `tests/e2e/features/environment.py`	Implemented scenario auto-retry support in `before_feature` hook; imports `patch_scenario_with_autoretry`, computes `max_flaky` retry cap from environment configuration, and patches scenarios tagged `@flaky` to retry on failures.
Flaky Test Tags `tests/e2e/features/mcp.feature`, `tests/e2e/features/query.feature`, `tests/e2e/features/responses.feature`, `tests/e2e/features/responses_streaming.feature`, `tests/e2e/features/streaming_query.feature`	Added `@flaky` tags to success-path e2e scenarios across multiple feature files, including MCP authentication flows, LLM response tests, streaming query tests, and conversation management tests. One minor formatting change (blank line removal) in `streaming_query.feature`.
Response Assertion Helpers `tests/e2e/features/steps/llm_query_response.py`	Introduced `_response_contains_fragment()` helper function for case-insensitive substring matching; refactored `check_fragments_in_responses_output_text()`, `check_fragments_in_response()`, and `check_streamed_fragments_in_response()` to use the helper and updated assertion messages accordingly.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title contains a typo ('retriy' instead of 'retry') and is vague about scope; while it references the core intent (retry capability), the typo and lack of specificity about which e2e tests are affected reduce clarity.	Correct the typo to 'Add e2e retry capability' for accuracy and clarity.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tisnik

LGTM

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/e2e/features/environment.py`:
- Around line 38-39: The _E2E_FEATURE_PERF_START_ATTR constant and timing doc
are stale because no timestamp is recorded or used; either remove the
constant/docs or wire the timing by setting
feature._lightspeed_e2e_feature_perf_start in the Behave hook before_feature and
computing/consuming the duration in after_feature (use time.time() or
datetime.now()), then store or log the computed duration (e.g., on feature or
context) so the declared feature-duration tracking is actually recorded and
consumed; update or remove any related docstrings and references accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7c1d5cb2-4f4e-4894-83eb-952383ac648d

📥 Commits

Reviewing files that changed from the base of the PR and between 6acfc16 and 1c6ed3d.

📒 Files selected for processing (7)

tests/e2e/features/environment.py
tests/e2e/features/mcp.feature
tests/e2e/features/query.feature
tests/e2e/features/responses.feature
tests/e2e/features/responses_streaming.feature
tests/e2e/features/steps/llm_query_response.py
tests/e2e/features/streaming_query.feature

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: unit_tests (3.13)
GitHub Check: unit_tests (3.12)
GitHub Check: build-pr
GitHub Check: integration_tests (3.12)
GitHub Check: list_outdated_dependencies
GitHub Check: integration_tests (3.13)
GitHub Check: Pylinter
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: E2E: library mode / ci
GitHub Check: E2E: server mode / ci
GitHub Check: E2E Tests for Lightspeed Evaluation job

🧰 Additional context used

📓 Path-based instructions (2)

tests/e2e/features/**/*.feature

📄 CodeRabbit inference engine (AGENTS.md)

Use behave (BDD) framework with Gherkin feature files for end-to-end tests

Files:

tests/e2e/features/query.feature
tests/e2e/features/responses.feature
tests/e2e/features/streaming_query.feature
tests/e2e/features/responses_streaming.feature
tests/e2e/features/mcp.feature

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async unit tests

Files:

tests/e2e/features/steps/llm_query_response.py
tests/e2e/features/environment.py

🧠 Learnings (9)

📓 Common learnings

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to tests/e2e/features/**/*.feature : Use behave (BDD) framework with Gherkin feature files for end-to-end tests

📚 Learning: 2026-04-05T12:19:36.009Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to tests/e2e/features/**/*.feature : Use behave (BDD) framework with Gherkin feature files for end-to-end tests

Applied to files:

tests/e2e/features/query.feature

📚 Learning: 2026-04-05T12:19:36.009Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Place E2E step definitions in `tests/e2e/features/steps/`

Applied to files:

tests/e2e/features/query.feature

📚 Learning: 2026-04-05T12:19:36.009Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Maintain E2E test list in `tests/e2e/test_list.txt`

Applied to files:

tests/e2e/features/query.feature

📚 Learning: 2025-09-02T11:09:40.404Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

tests/e2e/features/query.feature
tests/e2e/features/mcp.feature

📚 Learning: 2025-09-02T11:15:02.411Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.

Applied to files:

tests/e2e/features/query.feature

📚 Learning: 2026-04-07T09:20:26.590Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1467
File: tests/e2e/features/steps/common.py:36-49
Timestamp: 2026-04-07T09:20:26.590Z
Learning: For Behave-based Python tests, rely on Behave’s Context layered stack for attribute lifecycle: Behave pushes a new Context layer when entering feature scope (before_feature) and again for scenario scope (before_scenario). Attributes assigned inside given/when/then steps live on the current scenario layer and are automatically removed when the scenario ends. As a result, step-set attributes should not be expected to persist across scenarios or features, and manual cleanup in after_scenario/after_feature is generally unnecessary for attributes set in step functions. Only perform manual cleanup for attributes that you set explicitly in before_feature/before_scenario, since those live on the respective feature/scenario layers.

Applied to files:

tests/e2e/features/steps/llm_query_response.py
tests/e2e/features/environment.py

📚 Learning: 2026-04-13T13:39:54.963Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:206-211
Timestamp: 2026-04-13T13:39:54.963Z
Learning: In lightspeed-stack E2E tests under tests/e2e/features, it is intentional to set context.feature_config inside Background/step functions (scenario-scoped Behave layer). The environment.py after_scenario restore logic should only restore configuration when context.scenario_lightspeed_override_active is True; this flag is set by configure_service only when a real config switch occurs (so restore does not run for scenarios without a switch). Additionally, steps/common.py’s module-level _active_lightspeed_stack_config_basename is used to prevent re-applying the same config across subsequent scenarios, ensuring scenario_lightspeed_override_active stays False after the first apply. Therefore, reviewers should not “fix” this flow as if feature_config were incorrectly scoped or if after_scenario restoration is missing—config switching and restoration are meant to happen exactly once per actual switch, not redundantly per scenario.

Applied to files:

tests/e2e/features/steps/llm_query_response.py
tests/e2e/features/environment.py

📚 Learning: 2026-02-19T10:06:58.708Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1181
File: tests/e2e-prow/rhoai/manifests/lightspeed/mock-jwks.yaml:32-34
Timestamp: 2026-02-19T10:06:58.708Z
Learning: In tests/e2e-prow/rhoai/, ConfigMaps like mock-jwks-script and mcp-mock-server-script are created dynamically by the pipeline.sh deployment script using `oc create configmap` commands, rather than being defined as static ConfigMap resources in the manifest YAML files.

Applied to files:

tests/e2e/features/mcp.feature

🔇 Additional comments (7)

tests/e2e/features/steps/llm_query_response.py (1)

28-30: Good consistency improvement for fragment assertions.

The case-insensitive helper is applied consistently across all three fragment-checking steps, which should reduce brittle failures caused only by response capitalization while preserving clear failure output.

Also applies to: 246-247, 259-262, 272-273, 298-301, 311-312, 322-325

tests/e2e/features/responses.feature (1)

337-337: @flaky tagging is correctly scoped.

Nice targeted application of retry behavior to the known unstable scenario without changing its test intent.

tests/e2e/features/query.feature (1)

13-13: Flaky annotations look appropriate for these query scenarios.

The change is focused and keeps scenario behavior unchanged while enabling retry only where needed.

Also applies to: 26-26

tests/e2e/features/responses_streaming.feature (1)

328-328: Good targeted retry enablement for streaming conversation-flow scenarios.

These @flaky tags are applied narrowly to the high-variance paths and preserve existing assertions.

Also applies to: 359-359, 381-381

tests/e2e/features/streaming_query.feature (1)

22-22: Scoped @flaky updates look good.

Retry is enabled only for the unstable streaming-query prompt-variant scenarios, with no behavioral drift.

Also applies to: 36-36

tests/e2e/features/environment.py (1)

374-382: Flaky autoretry wiring is clean and well-guarded.

The env override parsing with safe fallback plus targeted patching of @flaky scenarios is a solid implementation.

tests/e2e/features/mcp.feature (1)

21-21: @flaky tagging across MCP success-path scenarios is consistent and appropriate.

This is a focused reliability change that preserves scenario logic while enabling retry where intermittent infrastructure variance is expected.

Also applies to: 38-38, 130-130, 147-147, 241-241, 261-261, 291-291, 308-308, 409-409, 429-429

* add retry functionality to flaky e2e tests --------- Co-authored-by: Radovan Fuchs <rfuchs@rfuchs-thinkpadp1gen7.tpb.csb>

Radovan Fuchs and others added 3 commits April 14, 2026 11:17

add retry functionality to flaky e2e tests

ac12850

fix

87e1eb3

Merge branch 'lightspeed-core:main' into LCORE-1472_e2e_retries

1c6ed3d

tisnik approved these changes Apr 14, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread tests/e2e/features/environment.py

Merge branch 'main' into LCORE-1472_e2e_retries

c3b8193

radofuchs merged commit fe35459 into lightspeed-core:main Apr 14, 2026
26 of 29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LCORE-1472: Add e2e retriy capability#1500

LCORE-1472: Add e2e retriy capability#1500
radofuchs merged 4 commits into
lightspeed-core:mainfrom
radofuchs:LCORE-1472_e2e_retries

radofuchs commented Apr 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 14, 2026 •

edited

Loading

Rate limit exceeded

❌ Failed checks (1 inconclusive)

Uh oh!

tisnik left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

radofuchs commented Apr 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

radofuchs commented Apr 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 14, 2026 •

edited

Loading