Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support state recovery when meta reboot #1702

Merged
merged 10 commits into from
Apr 8, 2022

Conversation

yezizp2012
Copy link
Contributor

@yezizp2012 yezizp2012 commented Apr 8, 2022

What's changed and what's your intention?

As title, there are several changes in this PR to support state recovery when meta reboot:

  • When meta leaves, frontend will re-subscribe until meta is online again. After re-subscribed, frontend will refresh its cache for catalog and worker info.
  • Change the operation logic for force_stop_actors. When failover found in some compute nodes, other compute nodes that contains actors in related DAGs will panic and gone. Further more, the original implementation of force_stop_actors will cause panic in compute node, that's not acceptable when reuse it for meta reboot. Here we just inject a stop barrier for all exist actors in living compute nodes, that works and will help us to stop all exist actors.
  • When meta reboot, we simply do a similar operation as recovery. This could be refine in the future when we have more barrier state persisted in meta store.

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

Resolve #1277

@codecov
Copy link

codecov bot commented Apr 8, 2022

Codecov Report

Merging #1702 (1dce082) into main (1da4127) will increase coverage by 0.19%.
The diff coverage is 54.14%.

@@            Coverage Diff             @@
##             main    #1702      +/-   ##
==========================================
+ Coverage   71.15%   71.34%   +0.19%     
==========================================
  Files         598      599       +1     
  Lines       77556    77645      +89     
==========================================
+ Hits        55182    55399     +217     
+ Misses      22374    22246     -128     
Flag Coverage Δ
rust 71.34% <54.14%> (+0.19%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/compute/src/rpc/service/stream_service.rs 0.00% <0.00%> (ø)
src/compute/src/server.rs 0.00% <0.00%> (ø)
src/ctl/src/common/meta_service.rs 0.00% <0.00%> (ø)
src/frontend/src/catalog/root_catalog.rs 71.96% <0.00%> (-2.25%) ⬇️
src/frontend/src/observer/observer_manager.rs 0.00% <0.00%> (ø)
src/frontend/src/scheduler/schedule.rs 9.58% <0.00%> (-0.56%) ⬇️
src/frontend/src/session.rs 44.55% <0.00%> (ø)
src/meta/src/model/mod.rs 99.05% <ø> (ø)
src/meta/src/rpc/server.rs 0.00% <0.00%> (ø)
src/rpc_client/src/meta_client.rs 0.00% <0.00%> (ø)
... and 22 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM. Great job!

src/meta/src/barrier/mod.rs Outdated Show resolved Hide resolved
src/meta/src/barrier/mod.rs Show resolved Hide resolved
src/meta/src/barrier/recovery.rs Outdated Show resolved Hide resolved
src/stream/src/task/stream_manager.rs Show resolved Hide resolved
src/meta/src/model/barrier.rs Outdated Show resolved Hide resolved
src/meta/src/model/barrier.rs Outdated Show resolved Hide resolved
Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

src/meta/src/barrier/mod.rs Show resolved Hide resolved
@yezizp2012 yezizp2012 enabled auto-merge (squash) April 8, 2022 10:41
@yezizp2012 yezizp2012 merged commit 710cbaf into main Apr 8, 2022
@yezizp2012 yezizp2012 deleted the feat/support-meta-reboot branch April 8, 2022 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

meta: state recovery when reboot meta
5 participants