Skip to content

Conversation

@radofuchs
Copy link
Contributor

@radofuchs radofuchs commented Oct 31, 2025

add additional e2e test for streaming query endpoint and enable color output

Description

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Bug Fixes

    • Improved error messages and assertions for streaming response validation
    • Refined token handling in response streaming to skip initial protocol messages
  • Tests

    • Added authorization header validation to streaming query tests
    • Introduced authentication failure scenario testing
    • Enhanced test output formatting with colored, pretty-printed results

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 31, 2025

Walkthrough

This PR enhances e2e tests with authorization header validation for streaming queries, improves terminal output visibility through environment configuration, refines streaming response token processing with first-token skipping logic, and adds validation for unauthenticated requests.

Changes

Cohort / File(s) Summary
Test Environment Configuration
.github/workflows/e2e_tests.yaml, Makefile
Adds TERM and FORCE_COLOR environment variables to e2e test execution and wraps command with script utility for enhanced terminal output and color formatting
Streaming Query Test Scenarios
tests/e2e/features/streaming_query.feature
Adds Authorization header requirements across streaming query test scenarios and introduces new scenario to validate unauthenticated request rejection with 400 status
Streaming Response Step Implementation
tests/e2e/features/steps/llm_query_response.py
Introduces first_token flag to skip initial token in streaming responses and adds explicit error messages to assertion validations

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

  • The first_token flag logic should be verified to handle all streaming response edge cases correctly
  • Authorization header changes across multiple test scenarios should be confirmed for consistency
  • Ensure terminal output configuration (TERM, FORCE_COLOR, script wrapper) doesn't inadvertently mask or obscure test failures

Possibly related PRs

Suggested reviewers

  • tisnik

Poem

🐰 Streams flow with auth so bright,
Colors dance in TERM's light,
Skip the first, take the rest,
E2E tests now pass the test!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "LCORE-758: add additional e2e test for streaming query endpoint and enable color output" directly aligns with the main objectives and changes in the PR. The title accurately captures the two primary changes: the addition of new e2e tests for the streaming query endpoint (visible in the streaming_query.feature file with new authorization scenarios and an unauthenticated request test) and the enablement of color output (visible in the .github/workflows/e2e_tests.yaml and Makefile modifications). The supporting changes to the streaming parser logic in llm_query_response.py are implementation details that enable the e2e tests to function correctly, rather than primary objectives. The title is specific, concise, and clearly conveys the main purpose of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@radofuchs radofuchs requested a review from tisnik October 31, 2025 11:34
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
Makefile (1)

24-24: Consider the impact of removing PYTHONDONTWRITEBYTECODE.

The previous command set PYTHONDONTWRITEBYTECODE=1 to prevent Python bytecode (.pyc) files from being created. Removing this may allow .pyc files to be generated during e2e test runs, which could affect test reproducibility or cause clutter in the test directories.

If you want to maintain the previous behavior, apply this diff:

-	script -q -e -c "uv run behave --color --format pretty --tags=-skip -D dump_errors=true @tests/e2e/test_list.txt"
+	PYTHONDONTWRITEBYTECODE=1 script -q -e -c "uv run behave --color --format pretty --tags=-skip -D dump_errors=true @tests/e2e/test_list.txt"
tests/e2e/features/steps/llm_query_response.py (1)

182-185: Consider adding validation before skipping the first token.

Instead of unconditionally skipping the first token, consider validating that it actually contains the expected shield/status message before discarding it. This would make the code more robust and provide better error messages if the expected format changes.

Apply this diff to add validation:

                 elif event == "token":
-                    # Skip the first token (shield status message)
                     if first_token:
                         first_token = False
-                        continue
+                        # Skip the first token if it's a shield/status message
+                        token = data["data"]["token"]
+                        # Add your shield message detection logic here
+                        # For now, skip if empty or matches known pattern
+                        if not token or token.strip() == "":
+                            continue
                     full_response_split.append(data["data"]["token"])
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 980e871 and 09fbe04.

📒 Files selected for processing (4)
  • .github/workflows/e2e_tests.yaml (1 hunks)
  • Makefile (1 hunks)
  • tests/e2e/features/steps/llm_query_response.py (3 hunks)
  • tests/e2e/features/streaming_query.feature (9 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
tests/e2e/features/**/*.feature

📄 CodeRabbit inference engine (CLAUDE.md)

Write E2E tests as Gherkin feature files for behave

Files:

  • tests/e2e/features/streaming_query.feature
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • tests/e2e/features/steps/llm_query_response.py
tests/e2e/features/steps/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Place behave step definitions under tests/e2e/features/steps/

Files:

  • tests/e2e/features/steps/llm_query_response.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/e2e/features/steps/llm_query_response.py
🧠 Learnings (5)
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.

Applied to files:

  • tests/e2e/features/streaming_query.feature
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • tests/e2e/features/streaming_query.feature
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to tests/e2e/test_list.txt : Maintain the E2E test list in tests/e2e/test_list.txt

Applied to files:

  • .github/workflows/e2e_tests.yaml
  • Makefile
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to tests/e2e/features/**/*.feature : Write E2E tests as Gherkin feature files for behave

Applied to files:

  • .github/workflows/e2e_tests.yaml
  • Makefile
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to tests/{unit,integration}/**/*.py : Use pytest for all unit and integration tests

Applied to files:

  • Makefile
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build-pr
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)
🔇 Additional comments (8)
.github/workflows/e2e_tests.yaml (1)

196-198: LGTM! Proper configuration for color output in CI.

The environment variables correctly enable 256-color terminal support and force color output in the CI environment, which aligns with the Makefile changes that add --color and --format pretty flags to behave.

tests/e2e/features/streaming_query.feature (4)

1-1: LGTM! Feature-level authorization tag added.

The @Authorized tag correctly enables authorization testing for all scenarios in this feature. Based on learnings, this triggers the specialized noop-with-token configuration for these tests.


46-65: Verify authorization header usage in conversation continuation scenario.

Line 48 sets the Authorization header, but lines 49 and 55 use the plain "ask question" step instead of "ask question with authorization header" like other scenarios. This inconsistency may indicate:

  1. The authorization header should be automatically carried over in the same conversation (if that's the intended behavior)
  2. These steps should be updated to use "with authorization header" for consistency

Please clarify the intended behavior for authorization in multi-turn conversations.


125-135: LGTM! Proper validation of missing authorization.

This new scenario correctly tests that requests without the Authorization header are rejected with a 400 status and appropriate error message. This complements the authenticated scenarios and ensures proper authorization enforcement.


11-12: Truncated JWT token is an intentional test fixture.

The token eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6Ikpva is consistently reused across all scenarios in this file and other feature files in the test suite (query.feature, conversations.feature, feedback.feature, authorized_noop*.feature). This uniformity confirms it is a deliberate test fixture, not an accidental truncation.

The tests verify API endpoint behavior when the Authorization header is provided or absent—they do not validate JWT token structure or content. Using a simplified token for testing is standard practice when the auth mechanism is not the focus of the test. The actual JWT validation occurs in the application's auth middleware, which is outside the scope of these endpoint tests.

Likely an incorrect or invalid review comment.

tests/e2e/features/steps/llm_query_response.py (2)

158-161: LGTM! Enhanced assertions improve test failure diagnostics.

The explicit error messages provide clear information when tests fail, making it easier to diagnose issues. The comparison output showing both responses is particularly helpful for debugging mismatches.


171-171: Document or conditionally handle the first-token skip logic in the e2e streaming response parser.

The code unconditionally skips the first "token" event in _parse_streaming_response, citing a "shield status message." However:

  • No documentation explains why or which providers/configurations require this skip
  • No unit tests verify this behavior
  • Different LLM providers may have different streaming response formats, making this assumption unreliable
  • The "turn_complete" event already contains the full response, so this skip only affects intermediate tokens

Either document the requirement with specific provider details, make the behavior configurable based on the provider being tested, or verify token content before skipping (e.g., check if it matches a known shield/status pattern).

Makefile (1)

24-24: ---

script utility is not available in CI—this will cause e2e tests to fail.

The script utility from util-linux is not installed in the GitHub Actions runner. The command script -q -e -c "uv run behave ..." will fail immediately.

Either install util-linux before running tests, or remove the script wrapper and handle output/exit codes directly:

uv run behave --color --format pretty --tags=-skip -D dump_errors=true @tests/e2e/test_list.txt
⛔ Skipped due to learnings
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to tests/e2e/test_list.txt : Maintain the E2E test list in tests/e2e/test_list.txt
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to tests/e2e/features/**/*.feature : Write E2E tests as Gherkin feature files for behave

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit 9cce972 into lightspeed-core:main Oct 31, 2025
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants