Skip to content

Refactor timeline enrichment pipeline to use DuckDB for enrichment and updates, implement batch DB updates, error handling, CLI integration, and maintain file fallback for backwards compatibility#29

Closed
jaxfry wants to merge 1 commit intomainfrom
feat/duckdb-timeline-enrichment

Conversation

@jaxfry
Copy link
Copy Markdown
Owner

@jaxfry jaxfry commented Jun 12, 2025

This pull request introduces major enhancements to the timeline enrichment pipeline, transitioning it from file-based operations to database-backed functionality. Key changes include robust database integration, CLI updates, improved error handling, and comprehensive testing to ensure backwards compatibility and reliability during the migration process.

Database Integration Enhancements:

  • Added new database tables (timeline_events and project_memory) with schema updates to support project classification and enrichment operations. (LifeLog/database/__init__.py, [1] [2]
  • Implemented database migration logic to ensure seamless updates to existing schemas, including adding a project column to timeline_events. (LifeLog/database/__init__.py, LifeLog/database/init.pyR42-R72)

CLI Updates:

  • Refactored the enrich subcommand to leverage database operations, introducing new options such as --batch-size and --fallback-to-files. Added error handling for database failures with optional fallback to file-based processing. (LifeLog/cli.py, LifeLog/cli.pyL68-R96)

Project Memory Enhancements:

  • Updated the ProjectMemory class to support database-backed storage, enabling real-time updates and fallback to file-based storage when the database is unavailable. (LifeLog/enrichment/project_classifier.py, [1] [2]
  • Added methods for loading, saving, and updating project memory entries in the database, ensuring data consistency during enrichment operations. (LifeLog/enrichment/project_classifier.py, [1] [2]

Configuration Updates:

  • Introduced new database-related settings in the Settings class, including options for enabling database usage, batch sizes, retry limits, and connection timeouts. (LifeLog/config.py, LifeLog/config.pyR68-R76)

Testing and Error Handling:

  • Created comprehensive tests for database enrichment functionality, covering event loading, batch updates, fallback mechanisms, and backwards compatibility. All tests pass successfully. (DATABASE_ENRICHMENT_COMPLETION.md, DATABASE_ENRICHMENT_COMPLETION.mdR1-R143)
  • Added logging and error handling across the enrichment pipeline to ensure graceful degradation during partial or complete database failures. (LifeLog/enrichment/project_classifier.py, [1] [2]

These changes collectively ensure a smooth transition to database-backed operations while maintaining reliability, compatibility, and robust error recovery mechanisms.

…d updates, implement batch DB updates, error handling, CLI integration, and maintain file fallback for backwards compatibility
@jaxfry jaxfry requested a review from Copilot June 12, 2025 07:03
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the timeline enrichment pipeline to transition from file‐based operations to database‐backed functionality. Key changes include:

  • Database integration enhancements with new tables and migration logic.
  • CLI updates to support database operations (e.g. batch processing and fallback options).
  • Project memory enhancements to persist data in the database with file fallback support.

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/test_project_classifier.py Updated test configuration to use file-based processing.
tests/test_enrichment_db.py Added tests for database update functionality.
tests/test_database_enrichment.py Comprehensive tests covering DB operations, fallback and recovery.
test_cli.py New CLI test script for verifying enriched command functionality.
LifeLog/summary/daily.py Updated embedding model loading with graceful degradation.
LifeLog/enrichment/project_classifier.py Refactored to support database-backed storage with fallback.
LifeLog/database/init.py Added migration logic and new DB schema for project memory.
LifeLog/config.py Introduced new database-related settings.
LifeLog/cli.py Updated CLI arguments and error handling for database enrichment.
DATABASE_ENRICHMENT_COMPLETION.md Documentation update detailing completion of DB-based enrichment.
Comments suppressed due to low confidence (1)

LifeLog/cli.py:93

  • Since the '--fallback-to-files' option is enabled but the fallback logic is not implemented, consider either removing the option until implemented or adding the necessary file-based fallback processing to ensure proper behavior.
log.warning("File-based fallback not yet implemented.")

# Check if project column exists, if not add it
try:
conn.execute("SELECT project FROM timeline_events LIMIT 1")
except:
Copy link

Copilot AI Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using a bare 'except' clause; instead, catch specific exceptions (e.g., duckdb.Error) to prevent unexpected error suppression.

Suggested change
except:
except duckdb.Error:

Copilot uses AI. Check for mistakes.
@jaxfry jaxfry closed this Jun 13, 2025
@jaxfry jaxfry deleted the feat/duckdb-timeline-enrichment branch June 13, 2025 00:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants