Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(meta): add lease for meta and remove check_exists in hummock #2951

Merged
merged 10 commits into from
Jun 15, 2022

Conversation

Little-Wallace
Copy link
Contributor

@Little-Wallace Little-Wallace commented Jun 1, 2022

Signed-off-by: Little-Wallace bupt2013211450@gmail.com

What's changed and what's your intention?

close #2515
In this PR, we only need to check whether the leader info changed when meta-service may reload after panic, or be partitioned with etcd.
If some other node became new meta-node, it will changed leader info and all transaction submitted by the origin meta-node will fail.

Checklist

  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
@Little-Wallace Little-Wallace changed the title feat(meta): election leader by etcd feat(meta): add lease for meta and remove check_exists in hummock Jun 1, 2022
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
META_CF_NAME.to_owned(),
META_LEADER_KEY.as_bytes().to_vec(),
info.encode_to_vec(),
);
meta_store.txn(trx).await.map_err(Into::into)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if context_id becomes invalid just before meta_store.txn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.... It seems that I need a memory lock for context_id, does it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.... It seems that I need a memory lock for context_id, does it?

Yeah something prevents ClusterManager from delete_worker_node.

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
@zwang28
Copy link
Contributor

zwang28 commented Jun 13, 2022

LGTM.
@yezizp2012 FYI we'd like to add meta lease.

@@ -65,7 +67,10 @@ enum Backend {
pub struct MetaNodeOpts {
// TODO: rename to listen_address and separate out the port.
#[clap(long, default_value = "127.0.0.1:5690")]
host: String,
listen_addr: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

@yezizp2012
Copy link
Contributor

LGTM.

@Little-Wallace
Copy link
Contributor Author

I talked about lease and context id with @zwang28 .
When some node lost connection with meta, the timestamp of it would not be updated and meta-service will remove this node and call release_context. But the operation release_context and report_compact_task or other operations in hummock manager are all blocked by RwLock. So if ClusterManager removed context_id after some thread check context_id and just before it write to meta-storage, it would be blocked by compaction until other thread finished writing to meta-storage. And then, it can see the results of the previous operations.

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
@zwang28 zwang28 self-requested a review June 14, 2022 14:00
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
@codecov
Copy link

codecov bot commented Jun 15, 2022

Codecov Report

Merging #2951 (9a69191) into main (48efe99) will decrease coverage by 0.09%.
The diff coverage is 19.43%.

@@            Coverage Diff             @@
##             main    #2951      +/-   ##
==========================================
- Coverage   73.57%   73.48%   -0.10%     
==========================================
  Files         744      744              
  Lines      102191   102397     +206     
==========================================
+ Hits        75192    75245      +53     
- Misses      26999    27152     +153     
Flag Coverage Δ
rust 73.48% <19.43%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/meta/src/lib.rs 1.33% <0.00%> (-0.26%) ⬇️
src/meta/src/rpc/server.rs 0.00% <0.00%> (ø)
src/meta/src/storage/etcd_meta_store.rs 0.00% <0.00%> (ø)
src/meta/src/test_utils.rs 0.00% <0.00%> (ø)
src/risedevtool/src/task/meta_node_service.rs 0.00% <0.00%> (ø)
src/meta/src/manager/env.rs 75.00% <87.09%> (+5.76%) ⬆️
src/meta/src/storage/mem_meta_store.rs 98.41% <87.50%> (-1.59%) ⬇️
src/meta/src/hummock/hummock_manager.rs 87.28% <100.00%> (+0.33%) ⬆️
src/meta/src/storage/transaction.rs 90.32% <100.00%> (+1.03%) ⬆️
... and 6 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@Little-Wallace Little-Wallace merged commit e87390a into main Jun 15, 2022
@Little-Wallace Little-Wallace deleted the wallace/leader branch June 15, 2022 05:39
@skyzh
Copy link
Contributor

skyzh commented Jun 21, 2022

This PR breaks the usage of --hosts args. Next time please inform us to update docker and risingwave-operator args, so that everything will work fine.

cc @Nebulazhang @Little-Wallace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor(meta): remove check_exists in meta transaction
4 participants