LCORE-302: User Data Collection #237

jrobertboos · 2025-07-14T13:31:52Z

Description

This pull request implements support for delivering user chat transcripts and feedback at a regular period to an ingress server.

Type of change

Related Tickets & Documents

Related Issue LCORE-302
Closes LCORE-302

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Introduced a Data Collector Service that periodically collects user feedback and transcript data, packages it, and securely sends it to a configured server.
- Added configuration options to enable, schedule, and customize the data collector’s behavior.
- Provided a new Makefile target and command-line flag to run the data collector independently.
Documentation
- Updated the README with detailed instructions and configuration guidance for the Data Collector Service.
Tests
- Added comprehensive unit tests for the data collector configuration, runner, and service functionality.
Chores
- Added development dependency for request type checking.

- Introduced DataCollectorConfiguration model to manage data collection settings. - Updated lightspeed-stack.yaml to include data collector configuration options. - Integrated data collector service startup and shutdown in main application flow.

- Introduced a new target in the Makefile to run the data collector service. - Enhanced `lightspeed_stack.py` to support starting the data collector via command-line argument. - Refactored `data_collector.py` to simplify the service's run method and remove unnecessary async handling. - Cleaned up the main application startup process by removing the data collector's startup and shutdown events from `main.py`.

- Added `types-requests` to development dependencies for type checking. - Modified `lightspeed_stack.py` to pass the data collector configuration to the `start_data_collector` function. - Updated `DataCollectorConfiguration` model to enforce required fields when data archival is enabled. - Refactored logging statements in `DataCollectorService` for improved clarity and consistency. - Added unit tests for `UserDataCollection` to validate data collector configuration scenarios.

- Included `types-requests` in the development dependencies for type checking. - Updated the lock file to reflect the new package and its version. - Added relevant wheel and source information for `types-requests`.

- Updated exception handling in tests to use specific exception types (OSError, requests.RequestException, tarfile.TarError). - Added new test for handling missing ingress server URL in tarball sending. - Improved test coverage for the _perform_collection method to ensure proper exception catching.

- Renamed validation method in DataCollectorConfiguration for clarity. - Updated error messages to reflect data collector context. - Improved exception handling in start_data_collector function to log errors. - Modified DataCollectorService to conditionally send feedback and transcript files based on configuration. - Updated unit tests to cover new exception handling and configuration scenarios.

- Introduced new constants for data collector settings, including collection interval, connection timeout, and retry interval. - Updated DataCollectorConfiguration to use these constants for default values and enforced positive integer types. - Refactored error handling in tests to align with the new configuration model. - Enhanced unit tests to validate default values and ensure proper configuration scenarios.

- Introduced `ingress_content_service_name` to the DataCollectorConfiguration model to specify the service name for data collection. - Updated `lightspeed-stack.yaml` to include the new configuration option. - Enhanced error handling to ensure `ingress_content_service_name` is required when the data collector is enabled. - Modified the DataCollectorService to use the new service name in the content type header during data transmission. - Updated unit tests to validate the new configuration and error scenarios.