Skip to content

Enhance error handling for journal v2 migration#21514

Merged
stelfrag merged 6 commits intonetdata:masterfrom
stelfrag:verify_valid_metric
Jan 8, 2026
Merged

Enhance error handling for journal v2 migration#21514
stelfrag merged 6 commits intonetdata:masterfrom
stelfrag:verify_valid_metric

Conversation

@stelfrag
Copy link
Collaborator

@stelfrag stelfrag commented Jan 3, 2026

Summary
  • Add checks to safely handle deleted metrics during page migration and cache operations.
  • Return NULL for invalid metric duplications in mrg_metric_dup.
  • Ensure proper resource cleanup and state marking when encountering deleted metrics.

Summary by cubic

Hardened journal v2 migration, cache, and storage engine paths to safely handle deleted or invalid metrics and null handles, preventing crashes and stuck queries. mrg_metric_dup now returns NULL on failed acquire, and callers skip work and clean up resources.

  • Bug Fixes
    • mrg_metric_dup: return NULL if metric_acquire fails.
    • Migration: validate UUID early; handle NULL metric/UUID; cleanup and continue.
    • Query preload: on NULL metric, complete prep/page, destroy PDC, adjust counters, and return.
    • PDC: guard metric release when NULL; clean up in error paths.
    • Storage engine: no-op when STORAGE_COLLECT_HANDLE is NULL in store, finalize, and frequency change; store init returns NULL safely.

Written for commit e70599e. Summary will update on new commits.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances error handling throughout the journal v2 migration and metric collection paths to safely handle deleted or invalid metrics. The core change is making mrg_metric_dup return NULL when metric acquisition fails (due to deletion), with corresponding NULL checks and cleanup logic added across the storage engine, cache, and query paths.

Key changes:

  • Modified mrg_metric_dup to return NULL on failed metric acquisition instead of assuming success
  • Added NULL handle guards in storage engine operations to prevent crashes when metrics are deleted
  • Implemented cleanup paths in migration, query preload, and collection initialization when encountering NULL metrics

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/database/engine/mrg.c Returns NULL from mrg_metric_dup when metric_acquire fails due to deleted metrics
src/database/engine/rrdengineapi.c Handles NULL return from mrg_metric_dup by cleaning up writer state and returning NULL
src/database/engine/pdc.c Guards metric release in PDC destructor to prevent NULL pointer dereference
src/database/engine/pagecache.c Adds NULL metric handling in query preload with completion initialization and cleanup
src/database/engine/cache.c Checks for NULL metric and NULL UUID during journal v2 migration with page cleanup
src/database/storage-engine.h Adds NULL checks to storage engine API functions for missing handles

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Add checks to safely handle deleted metrics during page migration and cache operations.
- Return NULL for invalid metric duplications in `mrg_metric_dup`.
- Ensure proper resource cleanup and state marking when encountering deleted metrics.

Enhance error handling for journal v2 migration: skip metrics with NULL UUID
…tialization logic

- Safely return when `STORAGE_COLLECT_HANDLE` is null in key storage engine methods.
- Reorder and streamline `pdc` initialization to avoid redundancy and improve clarity.
…error paths

- Add early UUID validity checks during metric migration to avoid unnecessary operations.
- Adjust inflight query counters in `pagecache` on query completion for accurate tracking.
@stelfrag stelfrag force-pushed the verify_valid_metric branch from 7cc7418 to e70599e Compare January 5, 2026 09:26
@stelfrag stelfrag marked this pull request as ready for review January 8, 2026 09:55
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 6 files

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@thiagoftsm thiagoftsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found running for more than one hour. LGTM!

@stelfrag stelfrag merged commit e82c147 into netdata:master Jan 8, 2026
166 of 167 checks passed
@stelfrag stelfrag deleted the verify_valid_metric branch January 8, 2026 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants