v1.5.0 - Bug fixes for data accumulation and link analysis
Bug Fixes
Data Accumulation Fix
- Fixed issue where statistics accumulated when running multiple cron tasks
- Added automatic cleanup of old analysis data before each indexation
- Statistics now show correct page counts (e.g., 21 instead of 42)
Link Analysis Improvements
- Restricted link detection to specific content fields (bodytext, header_link, pages, etc.)
- Removed aggressive regex that falsely detected numbers as page links
- Added deduplication for both pages and links to prevent duplicates
Type Casting & Edge Cases
- Fixed "Array to string conversion" errors in database operations
- Added proper integer casting for page_uid, source_page, target_page
- Added protection against division by zero in PageRank and centrality calculations
- Fixed orphaned themes cleanup query for better database compatibility
Technical Changes
PageMetricsService: Clean old data before inserting new analysisThemeDataService: Clean theme data for specific page subtreesPageLinkService: Improved link detection accuracy and deduplication
Upgrade Notes
After upgrading, run your scheduler task once to clean up any accumulated data.