Skip to content

Conversation

@jrobertboos
Copy link
Contributor

@jrobertboos jrobertboos commented Jul 14, 2025

Description

This pull request implements support for delivering user chat transcripts and feedback at a regular period to an ingress server.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Introduced a Data Collector Service that periodically collects user feedback and transcript data, packages it, and securely sends it to a configured server.
    • Added configuration options to enable, schedule, and customize the data collector’s behavior.
    • Provided a new Makefile target and command-line flag to run the data collector independently.
  • Documentation

    • Updated the README with detailed instructions and configuration guidance for the Data Collector Service.
  • Tests

    • Added comprehensive unit tests for the data collector configuration, runner, and service functionality.
  • Chores

    • Added development dependency for request type checking.

- Introduced DataCollectorConfiguration model to manage data collection settings.
- Updated lightspeed-stack.yaml to include data collector configuration options.
- Integrated data collector service startup and shutdown in main application flow.
- Introduced a new target in the Makefile to run the data collector service.
- Enhanced `lightspeed_stack.py` to support starting the data collector via command-line argument.
- Refactored `data_collector.py` to simplify the service's run method and remove unnecessary async handling.
- Cleaned up the main application startup process by removing the data collector's startup and shutdown events from `main.py`.
- Added `types-requests` to development dependencies for type checking.
- Modified `lightspeed_stack.py` to pass the data collector configuration to the `start_data_collector` function.
- Updated `DataCollectorConfiguration` model to enforce required fields when data archival is enabled.
- Refactored logging statements in `DataCollectorService` for improved clarity and consistency.
- Added unit tests for `UserDataCollection` to validate data collector configuration scenarios.
- Included `types-requests` in the development dependencies for type checking.
- Updated the lock file to reflect the new package and its version.
- Added relevant wheel and source information for `types-requests`.
- Updated exception handling in tests to use specific exception types (OSError, requests.RequestException, tarfile.TarError).
- Added new test for handling missing ingress server URL in tarball sending.
- Improved test coverage for the _perform_collection method to ensure proper exception catching.
- Renamed validation method in DataCollectorConfiguration for clarity.
- Updated error messages to reflect data collector context.
- Improved exception handling in start_data_collector function to log errors.
- Modified DataCollectorService to conditionally send feedback and transcript files based on configuration.
- Updated unit tests to cover new exception handling and configuration scenarios.
- Introduced new constants for data collector settings, including collection interval, connection timeout, and retry interval.
- Updated DataCollectorConfiguration to use these constants for default values and enforced positive integer types.
- Refactored error handling in tests to align with the new configuration model.
- Enhanced unit tests to validate default values and ensure proper configuration scenarios.
- Introduced `ingress_content_service_name` to the DataCollectorConfiguration model to specify the service name for data collection.
- Updated `lightspeed-stack.yaml` to include the new configuration option.
- Enhanced error handling to ensure `ingress_content_service_name` is required when the data collector is enabled.
- Modified the DataCollectorService to use the new service name in the content type header during data transmission.
- Updated unit tests to validate the new configuration and error scenarios.
- Introduced DataCollectorConfiguration model to manage data collection settings.
- Updated lightspeed-stack.yaml to include data collector configuration options.
- Integrated data collector service startup and shutdown in main application flow.
- Introduced a new target in the Makefile to run the data collector service.
- Enhanced `lightspeed_stack.py` to support starting the data collector via command-line argument.
- Refactored `data_collector.py` to simplify the service's run method and remove unnecessary async handling.
- Cleaned up the main application startup process by removing the data collector's startup and shutdown events from `main.py`.
- Added `types-requests` to development dependencies for type checking.
- Modified `lightspeed_stack.py` to pass the data collector configuration to the `start_data_collector` function.
- Updated `DataCollectorConfiguration` model to enforce required fields when data archival is enabled.
- Refactored logging statements in `DataCollectorService` for improved clarity and consistency.
- Added unit tests for `UserDataCollection` to validate data collector configuration scenarios.
- Included `types-requests` in the development dependencies for type checking.
- Updated the lock file to reflect the new package and its version.
- Added relevant wheel and source information for `types-requests`.
- Updated exception handling in tests to use specific exception types (OSError, requests.RequestException, tarfile.TarError).
- Added new test for handling missing ingress server URL in tarball sending.
- Improved test coverage for the _perform_collection method to ensure proper exception catching.
- Renamed validation method in DataCollectorConfiguration for clarity.
- Updated error messages to reflect data collector context.
- Improved exception handling in start_data_collector function to log errors.
- Modified DataCollectorService to conditionally send feedback and transcript files based on configuration.
- Updated unit tests to cover new exception handling and configuration scenarios.
- Introduced new constants for data collector settings, including collection interval, connection timeout, and retry interval.
- Updated DataCollectorConfiguration to use these constants for default values and enforced positive integer types.
- Refactored error handling in tests to align with the new configuration model.
- Enhanced unit tests to validate default values and ensure proper configuration scenarios.
- Introduced `ingress_content_service_name` to the DataCollectorConfiguration model to specify the service name for data collection.
- Updated `lightspeed-stack.yaml` to include the new configuration option.
- Enhanced error handling to ensure `ingress_content_service_name` is required when the data collector is enabled.
- Modified the DataCollectorService to use the new service name in the content type header during data transmission.
- Updated unit tests to validate the new configuration and error scenarios.
- Bumped versions for `astroid` (3.3.10 to 3.3.11) and `certifi` (2025.7.9 to 2025.7.14) in the lock file, including updated source URLs and hashes.
- Removed outdated `typing-extensions` package entry from the lock file.
- Cleaned up import statements in `config.py` to resolve merge conflicts and improve readability.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 14, 2025

Walkthrough

A new data collector feature was introduced, including configuration, service implementation, command-line integration, Makefile and documentation updates, and comprehensive unit tests. The data collector periodically gathers user feedback and transcript data, packages them, and sends them to an ingress server with configurable parameters and robust error handling.

Changes

File(s) Change Summary
Makefile, README.md, lightspeed-stack.yaml Added new Makefile target and documentation for running the data collector; extended config with data collector section.
pyproject.toml Added types-requests to development dependencies.
src/constants.py Introduced constants for data collector intervals and timeouts.
src/lightspeed_stack.py Added --data-collector argument; integrated data collector startup logic.
src/models/config.py Added DataCollectorConfiguration model and integrated it into UserDataCollection.
src/runners/data_collector.py New runner module for starting the data collector service.
src/services/init.py Added package docstring for services.
src/services/data_collector.py Implemented DataCollectorService for periodic data collection, packaging, sending, and cleanup.
tests/unit/models/test_config.py Added tests for data collector configuration validation and serialization.
tests/unit/runners/test_data_collector_runner.py New tests for data collector runner logic and error propagation.
tests/unit/services/test_data_collector.py Extensive tests for DataCollectorService covering collection, packaging, sending, and cleanup logic.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI/Makefile
    participant lightspeed_stack.py
    participant DataCollectorService

    User->>CLI/Makefile: make run-data-collector
    CLI/Makefile->>lightspeed_stack.py: python src/lightspeed_stack.py --data-collector
    lightspeed_stack.py->>DataCollectorService: start_data_collector(config)
    DataCollectorService->>DataCollectorService: run()
    DataCollectorService->>DataCollectorService: Collect feedback/transcript files
    DataCollectorService->>DataCollectorService: Package files into tarball
    DataCollectorService->>Ingress Server: Send tarball (HTTP POST)
    DataCollectorService->>DataCollectorService: Cleanup files/tarball
    DataCollectorService-->>lightspeed_stack.py: Wait for next interval or exit
Loading

Poem

A rabbit hops with data in tow,
Collecting feedback as onward we go.
Tarballs zipped and sent with care,
To the ingress server, bytes fill the air.
With cleanup and retries, it’s robust and neat—
The data collector’s work is now complete!
🐇📦✨

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@jrobertboos jrobertboos marked this pull request as ready for review July 14, 2025 13:32
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
README.md (1)

312-354: Comprehensive documentation for the data collector service.

The documentation effectively covers all aspects of the new service including features, configuration, and usage instructions. The YAML configuration example is helpful and the command-line usage is clearly explained.

However, there are markdown linting issues that should be addressed for consistency.

Fix the markdown list style inconsistency flagged by static analysis:

-### Features
-
-- **Periodic Collection**: Runs at configurable intervals
-- **Data Packaging**: Packages feedback and transcript files into compressed tar.gz archives
-- **Secure Transmission**: Sends data to a configured ingress server with optional authentication
-- **File Cleanup**: Optionally removes local files after successful transmission
-- **Error Handling**: Includes retry logic and comprehensive error handling
+### Features
+
+* **Periodic Collection**: Runs at configurable intervals
+* **Data Packaging**: Packages feedback and transcript files into compressed tar.gz archives
+* **Secure Transmission**: Sends data to a configured ingress server with optional authentication
+* **File Cleanup**: Optionally removes local files after successful transmission
+* **Error Handling**: Includes retry logic and comprehensive error handling
src/services/data_collector.py (1)

25-50: Remove redundant null check for collection_interval.

The collection_interval field is defined as PositiveInt in the configuration model, which means it cannot be None. The null check on line 40 is redundant.

Apply this diff to remove the redundant check:

-                if collector_config.collection_interval is not None:
-                    time.sleep(collector_config.collection_interval)
+                time.sleep(collector_config.collection_interval)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c0acc8 and f8becbf.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • Makefile (1 hunks)
  • README.md (2 hunks)
  • lightspeed-stack.yaml (1 hunks)
  • pyproject.toml (1 hunks)
  • src/constants.py (1 hunks)
  • src/lightspeed_stack.py (3 hunks)
  • src/models/config.py (2 hunks)
  • src/runners/data_collector.py (1 hunks)
  • src/services/__init__.py (1 hunks)
  • src/services/data_collector.py (1 hunks)
  • tests/unit/models/test_config.py (5 hunks)
  • tests/unit/runners/test_data_collector_runner.py (1 hunks)
  • tests/unit/services/test_data_collector.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
src/runners/data_collector.py (2)
src/models/config.py (1)
  • DataCollectorConfiguration (88-110)
src/services/data_collector.py (2)
  • DataCollectorService (18-258)
  • run (25-49)
src/lightspeed_stack.py (2)
src/runners/data_collector.py (1)
  • start_data_collector (11-26)
src/configuration.py (2)
  • configuration (47-52)
  • user_data_collection_configuration (71-76)
tests/unit/models/test_config.py (1)
src/models/config.py (6)
  • Configuration (172-188)
  • LLamaStackConfiguration (57-85)
  • ServiceConfiguration (26-46)
  • UserDataCollection (113-131)
  • ModelContextProtocolServer (49-54)
  • DataCollectorConfiguration (88-110)
🪛 markdownlint-cli2 (0.17.2)
README.md

318-318: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


319-319: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


320-320: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


321-321: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


322-322: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-pr
🔇 Additional comments (20)
src/services/__init__.py (1)

1-1: LGTM: Clean package initialization

The package initialization file is properly structured with a clear docstring.

pyproject.toml (1)

50-50: LGTM: Appropriate type stub addition

The types-requests dependency addition is necessary for type checking the data collector's HTTP functionality.

Makefile (1)

11-12: LGTM: Consistent Makefile target

The new target follows the established pattern and provides a convenient way to run the data collector service.

src/constants.py (1)

46-49: LGTM: Well-chosen timing constants

The data collector constants have appropriate values and clear naming. The inline comments helpfully explain the time units.

src/runners/data_collector.py (1)

11-17: LGTM: Clean function structure with proper early return

The function signature is clear and the early return when data collection is disabled is a good practice.

tests/unit/runners/test_data_collector_runner.py (1)

1-61: Test coverage looks comprehensive and well-structured.

The test file effectively covers the key scenarios for the start_data_collector function:

  • Service execution when enabled
  • Service bypass when disabled
  • Exception propagation and handling

The use of mocking is appropriate to isolate the unit under test from the actual service implementation.

src/lightspeed_stack.py (3)

13-13: Clean integration of the data collector import.

The import follows the existing pattern and is properly organized with other runner imports.


51-57: Command-line argument addition is well-implemented.

The new --data-collector argument is properly configured with appropriate help text and default value. The naming convention is consistent with existing arguments.


81-84: Conditional logic correctly implements mutual exclusivity.

The implementation properly ensures that either the data collector or the web service runs, but not both simultaneously. The configuration access via user_data_collection_configuration.data_collector is correct based on the relevant code snippets.

tests/unit/models/test_config.py (4)

8-13: Import additions are properly organized.

The new constants and model imports are appropriately added and follow the existing import structure.


139-153: Positive test case is well-structured.

The test properly verifies that a correctly configured DataCollectorConfiguration with all required fields constructs successfully and sets the enabled flag properly.


155-184: Comprehensive validation testing.

The test effectively covers both required field validation scenarios:

  • Missing ingress_server_url when enabled
  • Missing ingress_content_service_name when enabled

The error messages are properly matched and the test structure is consistent.


423-431: JSON serialization test updated correctly.

The test verifies that the new data_collector section is properly included in the JSON dump with the correct default values from the constants.

README.md (1)

160-160: Make target addition is correctly documented.

The new run-data-collector target is properly added to the available targets list and follows the existing documentation format.

src/models/config.py (2)

88-111: Well-structured configuration model with proper validation.

The DataCollectorConfiguration class is well-designed with appropriate field types and validation logic. The use of PositiveInt for numeric fields and proper validation in the check_data_collector_configuration method ensures configuration consistency.


120-120: Clean integration with existing configuration structure.

The integration of the data collector configuration into the UserDataCollection model is clean and follows the established pattern with proper default initialization.

src/services/data_collector.py (3)

170-212: Robust HTTP transmission with proper error handling.

The tarball transmission logic is well-implemented with proper authentication handling, timeout configuration, and comprehensive error checking. The use of bearer token authentication and custom content-type headers follows best practices.


143-169: Efficient tarball creation with proper error handling.

The tarball creation method properly handles file addition errors and maintains directory structure. The use of relative paths and error logging for individual file failures ensures robustness.


222-251: Comprehensive cleanup logic for empty directories.

The cleanup method properly handles the nested directory structure (user/conversation) and gracefully handles permission errors. The logic correctly removes empty directories bottom-up.

tests/unit/services/test_data_collector.py (1)

1-588: Comprehensive test suite with excellent coverage.

The test suite provides thorough coverage of the DataCollectorService functionality including:

  • Normal operation and error conditions
  • File collection with various configuration states
  • Tarball creation and transmission
  • Cleanup operations with error handling
  • Proper mocking of dependencies and external resources

The tests are well-structured and follow good testing practices with clear test names and proper assertions.

Comment on lines +19 to +26
try:
service = DataCollectorService()
service.run()
except Exception as e:
logger.error(
"Data collector service encountered an exception: %s", e, exc_info=True
)
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify configuration consistency between runner and service

The runner receives a DataCollectorConfiguration parameter but from the relevant code snippets, the DataCollectorService appears to access configuration from a global configuration object (configuration.user_data_collection_configuration.data_collector). This could lead to inconsistency where the runner thinks data collection is enabled but the service has different configuration.

Consider either:

  1. Passing the configuration to the service constructor
  2. Using the same global configuration source in both places
  3. Adding validation to ensure consistency

🏁 Script executed:

#!/bin/bash
# Check how DataCollectorService accesses configuration
ast-grep --pattern 'class DataCollectorService {
  $$$
}'

# Check for configuration access patterns in the service
rg -A 10 "configuration\." src/services/data_collector.py

Length of output: 3498


I’d like to confirm how the runner receives and passes the configuration. Let’s pull the full runner code:


🏁 Script executed:

#!/bin/bash
sed -n '1,200p' src/runners/data_collector.py

Length of output: 783


Continuing verification of how configuration is passed into the service:


🏁 Script executed:

#!/bin/bash
# Show imports and top of DataCollectorService
sed -n '1,50p' src/services/data_collector.py

# Locate DataCollectorConfiguration definition
rg -n "class DataCollectorConfiguration" -A5 src/models/config.py

Length of output: 2088


Synchronize runner and service configuration sources

The start_data_collector runner takes a DataCollectorConfiguration parameter but never passes it into DataCollectorService, which always reads from the global configuration.user_data_collection_configuration.data_collector. This disconnect can cause the service to run with different settings than the runner expects.

Please update the implementation so both use the same config:

  • In src/runners/data_collector.py, pass the runner’s configuration into the service.
  • In src/services/data_collector.py, add a constructor (or method parameter) to accept and store that config instead of pulling from the global.
  • Replace all occurrences of configuration.user_data_collection_configuration.data_collector in the service with the injected config object.

Example diff:

# src/runners/data_collector.py
- service = DataCollectorService()
+ service = DataCollectorService(configuration)

# src/services/data_collector.py
-class DataCollectorService:  # pylint: disable=too-few-public-methods
+class DataCollectorService:  # pylint: disable=too-few-public-methods
+    def __init__(self, config: DataCollectorConfiguration):
+        self._collector_config = config

     def run(self) -> None:
-        collector_config = configuration.user_data_collection_configuration.data_collector
+        collector_config = self._collector_config
         logger.info("Starting data collection service")
         …
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
service = DataCollectorService()
service.run()
except Exception as e:
logger.error(
"Data collector service encountered an exception: %s", e, exc_info=True
)
raise
--- src/runners/data_collector.py
@@ lines 19-26
try:
- service = DataCollectorService()
+ service = DataCollectorService(configuration)
service.run()
except Exception as e:
logger.error(
"Data collector service encountered an exception: %s", e, exc_info=True
)
raise
Suggested change
try:
service = DataCollectorService()
service.run()
except Exception as e:
logger.error(
"Data collector service encountered an exception: %s", e, exc_info=True
)
raise
--- src/services/data_collector.py
@@
class DataCollectorService: # pylint: disable=too-few-public-methods
+ def __init__(self, config: DataCollectorConfiguration):
+ self._collector_config = config
def run(self) -> None:
- collector_config = configuration.user_data_collection_configuration.data_collector
+ collector_config = self._collector_config
logger.info("Starting data collection service")
🤖 Prompt for AI Agents
In src/runners/data_collector.py around lines 19 to 26, the DataCollectorService
is instantiated without passing the runner's DataCollectorConfiguration, causing
a mismatch in configuration sources. Modify the runner to pass its configuration
object to the DataCollectorService constructor. Then, in
src/services/data_collector.py, add a constructor parameter to accept this
configuration and store it as an instance variable. Replace all references to
the global configuration.user_data_collection_configuration.data_collector in
the service with this injected configuration to ensure both runner and service
use the same settings.

Comment on lines +23 to +30
data_collector:
enabled: false
ingress_server_url: null
ingress_server_auth_token: null
ingress_content_service_name: null
collection_interval: 7200 # 2 hours in seconds
cleanup_after_send: true
connection_timeout_seconds: 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Configuration section is well-structured with appropriate defaults.

The new data_collector section includes all necessary parameters with sensible default values. The service is disabled by default, which is appropriate for a new feature.

However, there's a potential field name inconsistency that should be verified.

Verify the field name consistency between the YAML configuration and the model definition:

The YAML uses connection_timeout_seconds while the model likely uses connection_timeout. This inconsistency could cause configuration parsing issues.


🏁 Script executed:

#!/bin/bash
# Check if the field name in YAML matches the model definition
echo "Checking field name in YAML configuration:"
grep -n "connection_timeout" lightspeed-stack.yaml

echo -e "\nChecking field name in model definition:"
ast-grep --pattern 'connection_timeout$_: $_' src/models/config.py

Length of output: 349


🏁 Script executed:

#!/bin/bash
# Search for 'connection_timeout_seconds' in model definitions
echo "Searching for 'connection_timeout_seconds' in model definitions:"
grep -R "connection_timeout_seconds" -n src/models

echo -e "\nSearching for DataCollector class definition and its fields:"
grep -R "class DataCollector" -n src/models

Length of output: 447


#!/bin/bash
# Locate the DataCollectorConfiguration class and list its fields
grep -n "class DataCollectorConfiguration" src/models/config.py
grep -A20 "class DataCollectorConfiguration" src/models/config.py

# Search specifically for timeout-related fields
grep -n "timeout" src/models/config.py

🏁 Script executed:

#!/bin/bash
# Locate the DataCollectorConfiguration class and list its fields
grep -n "class DataCollectorConfiguration" src/models/config.py
grep -A20 "class DataCollectorConfiguration" src/models/config.py

# Search specifically for timeout-related fields within the class
sed -n '88,120p' src/models/config.py | grep -n "timeout"

Length of output: 1387


Rename connection_timeout_seconds to connection_timeout in YAML

The Pydantic model DataCollectorConfiguration defines its timeout field as connection_timeout, so the YAML key connection_timeout_seconds will not be picked up (and may raise an “extra field” error or be ignored).

Please update lightspeed-stack.yaml at line 30:

   cleanup_after_send: true
-  connection_timeout_seconds: 30
+  connection_timeout: 30

This ensures the configuration maps correctly to DataCollectorConfiguration.connection_timeout.

🤖 Prompt for AI Agents
In lightspeed-stack.yaml at line 30, rename the key `connection_timeout_seconds`
to `connection_timeout` to match the Pydantic model field name
`connection_timeout`. This change ensures the YAML configuration correctly maps
to the DataCollectorConfiguration model without causing errors or being ignored.

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks pretty ok

@tisnik tisnik merged commit cb329b7 into lightspeed-core:main Jul 14, 2025
33 checks passed
ingress_content_service_name: "lightspeed-team"
collection_interval: 7200 # 2 hours in seconds
cleanup_after_send: true
connection_timeout: 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be a separate root-level configuration? Like

user_data_collection:
  feedback_disabled: false
  feedback_storage: "/tmp/data/feedback"
  transcripts_disabled: false
  transcripts_storage: "/tmp/data/transcripts"
user_data_export:
  enabled: true
  ingress_server_url: "https://your-ingress-server.com"
  ingress_server_auth_token: "your-auth-token"
  ingress_content_service_name: "lightspeed-team"
  collection_interval: 7200  # 2 hours in seconds
  cleanup_after_send: true
  connection_timeout: 30

As those are two independent pieces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants