Skip to content

Conversation

@chrislusf
Copy link
Collaborator

@chrislusf chrislusf commented Jan 1, 2026

This PR enables immediate EC shard reporting during volume server startup,

  1. Move channel creation in NewStore to before volume loading.
  2. Add ecShardNotifyHandler with non-blocking send logic to prevent deadlock.
  3. Update DiskLocation to support the callback.

Summary by CodeRabbit

  • New Features

    • Startup now emits erasure-coded (EC) shard information earlier and per-shard notifications are available during shard loading.
  • Improvements

    • Pre-initialized notification channels to ensure non-blocking startup reporting.
    • More robust detection, validation and cleanup for incomplete or distributed EC volumes.
    • Startup now logs and defers reporting when notification channels are full to avoid blocked initialization.

✏️ Tip: You can customize this high-level summary in your review settings.

Ported the immediate EC shard reporting feature from Enterprise to Community version.
This allows the master to be notified about EC shards immediately during volume server startup,
instead of waiting for the first heartbeat.

Changes:
1. Updated NewStore to initialize notification channels BEFORE loading volumes (fixes potential nil panic).
2. Added ecShardNotifyHandler to report EC shards to NewEcShardsChan during startup.
3. Implemented non-blocking channel send for EC reporting to prevent deadlock when loading many EC shards (fixing the enterprise bug 17ac129).
4. Updated DiskLocation and EC loading logic to support the callback.

This optimization improves cluster state consistency and startup speed for EC-heavy clusters.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 1, 2026

Caution

Review failed

The pull request is closed.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds a per-shard callback into EC shard loading: DiskLocation gains an ecShardNotifyHandler; EC loaders accept an on-shard callback; Store initializes EC-related channels and wires a startup handler that non-blockingly reports EC shard info to NewEcShardsChan.

Changes

Cohort / File(s) Summary
EC Shard Callback & Helpers
weed/storage/disk_location_ec.go
Adds loadEcShardsWithCallback and loadAllEcShardsWithCallback, threads an onShardLoad callback through handleFoundEcxFile and EC grouping paths, and adds helpers (checkDatFileExists, calculateExpectedShardSize, checkOrphanedShards, removeEcVolumeFiles) used during shard loading/cleanup.
DiskLocation Field
weed/storage/disk_location.go
Adds ecShardNotifyHandler func(collection string, vid needle.VolumeId, shardId erasure_coding.ShardId, ecVolume *erasure_coding.EcVolume) to DiskLocation and switches EC loading to use the callback-enabled loader.
Store Channels & Startup Handler
weed/storage/store.go
Pre-initializes NewVolumesChan, DeletedVolumesChan, NewEcShardsChan, DeletedEcShardsChan; assigns ecShardNotifyHandler on locations to non-blockingly emit VolumeEcShardInformationMessage to NewEcShardsChan during startup (logs if channel full and defers to heartbeat); removes redundant channel re-creation.

Sequence Diagram

sequenceDiagram
    participant Store
    participant DiskLocation
    participant ECLoader as EC Loader
    participant Handler as Notify Handler
    participant Chan as NewEcShardsChan

    Store->>Store: create channels (NewEcShardsChan, DeletedEcShardsChan, ...)
    Store->>DiskLocation: set ecShardNotifyHandler(handler)
    Store->>DiskLocation: loadAllEcShardsWithCallback(onShardLoad)

    loop per EC shard
      DiskLocation->>ECLoader: load shard files / validate
      ECLoader->>Handler: onShardLoad(collection, vid, shardId, ecVolume)

      rect rgba(90,170,120,0.08)
      Note over Handler,Chan: handler enqueues startup notification
      alt channel has capacity
        Handler->>Chan: non-blocking send VolumeEcShardInformationMessage
      else channel full
        Handler->>Handler: log "startup channel full" (defer to heartbeat)
      end
      end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Poem

🐰 I hop through shards at break of day,

I tap a callback, send their way.
Channels wake and messages spring,
Startup hops—each shard gets a ring. 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ❓ Inconclusive The description addresses the template's core sections but lacks detail on testing and omits required checklist items, though it covers the problem and solution approach. Complete the description template by adding 'How is the PR tested?' section details and filling out the checklist items (unit tests, wiki documentation) to fully comply with repository requirements.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: enabling immediate EC shard reporting during startup, which aligns with the changeset's core objective.

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 718d0c9 and 5ba9b6e.

📒 Files selected for processing (1)
  • weed/storage/store.go

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @chrislusf, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request optimizes the startup process of the volume server by enabling immediate reporting of erasure coded (EC) shards. Previously, EC shards would only be reported to the master during subsequent heartbeat cycles, leading to a delay in the master's awareness of available shards. By introducing a callback mechanism during shard loading and ensuring channels are initialized earlier, the system can now report newly loaded EC shards as soon as they are discovered, improving system responsiveness and consistency from the moment a volume server starts.

Highlights

  • Immediate EC Shard Reporting: Enables erasure coded (EC) shards to be reported immediately during volume server startup, rather than waiting for a full heartbeat cycle, improving system awareness.
  • Callback Mechanism for Shard Loading: Introduces an ecShardNotifyHandler callback function in DiskLocation to allow external logic to be executed as soon as an EC shard is loaded.
  • Non-Blocking Channel Send: Implements a non-blocking send to the NewEcShardsChan during startup to prevent potential deadlocks. If the channel is full, reporting will occur via the regular heartbeat.
  • Channel Initialization Order: Reorders channel creation in the NewStore function to occur before volume loading, ensuring that channels are ready for immediate reporting of volumes and EC shards.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

gemini-code-assist[bot]

This comment was marked as resolved.

@chrislusf chrislusf requested a review from Copilot January 1, 2026 22:46
gemini-code-assist[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

This comment was marked as resolved.

@chrislusf chrislusf merged commit 4e2af08 into master Jan 1, 2026
36 of 37 checks passed
@chrislusf chrislusf deleted the optimize/ec-shard-reporting-startup branch January 1, 2026 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants