-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add graphman command to change block cache shard #5169
Conversation
) | ||
.execute(&conn)?; | ||
Ok(()) | ||
})?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I am wondering about is what happens when this block gets interrupted (e.g., because the user presses ^C) Since there are three shards involved, it may leave everything in a messy state, possibly in a state where graph-node
won't start again. I think it's right to use one transaction for the primary, at least that way, we won't lose the constraint.
Maybe this order of operations is safer:
- Add the new chain store to the new shard (no change to primary yet) It's a little sucky since you can't use the
add_chain
helper for that as that modifies the primary - Update the
chains
table in the primary (in one txn: updatechains
entry to-old
, create new entry for chain name pointing to new shard) - Update the name in the old shard to
-old
I am not 100% sure that that leaves everything at least in a state where graph-node
will start again, but I think it's close.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im not entirely sure of what you suggested, i've made some changes now, could you take a look.
Changed the order, but had to get the next value for the chains_id_seq when doing this before calling add_chain
graph-node/node/src/manager/commands/chain.rs
Lines 183 to 221 in 8cded38
conn.transaction(|| -> Result<(), StoreError> { | |
let ident = chain_store.chain_identifier.clone(); | |
let shard = Shard::new(shard.to_string())?; | |
// Fetch the current last_value from the sequence | |
let result = sql_query("SELECT last_value FROM chains_id_seq").get_result::<ChainIdSeq>(&conn)?; | |
let last_val = result.last_value; | |
let next_val = last_val + 1; | |
let namespace = format!("chain{}", next_val); | |
let storage= Storage::new(namespace.to_string()).map_err(|e| { | |
anyhow!("Failed to create storage: {}", e) | |
})?; | |
store.add_chain_store_inner(&chain_name, &shard, &ident, storage,ChainStatus::Ingestible, true)?; | |
// Drop the foreign key constraint on deployment_schemas | |
sql_query( | |
"alter table deployment_schemas drop constraint deployment_schemas_network_fkey;", | |
) | |
.execute(&conn)?; | |
// Update the current chain name to chain-old | |
update_chain_name(&conn, &chain_name, &new_name)?; | |
chain_store.update_name(&new_name)?; | |
// Create a new chain with the name in the destination shard | |
let _= add_chain(&conn, &chain_name, &ident, &shard)?; | |
// Re-add the foreign key constraint | |
sql_query( | |
"alter table deployment_schemas add constraint deployment_schemas_network_fkey foreign key (network) references chains(name);", | |
) | |
.execute(&conn)?; | |
Ok(()) | |
})?; | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I am trying to get at is that the conn.transaction
block puts changes in the primary into a transaction. If that gets interrupted, the transaction in the primary gets rolled back, but changes made in other shards will stick around. My concern then is that we could end up in a state where the primary still points at the chain in the old shard (in chains
), but we've already updated the chain name in that shard (in ethereum_networks
), and so when graph-node
starts up again, it won't find that chain since it first looks at the chains
table to figure out which shard the block cache is in, but can't find the corresponding entry in ethereum_networks
.
I think the only change that's needed from what you have right now is to move the call to chain_store.update_name(&new_name)?;
out of the transaction block. When the txn commits we have an entry for chain
in the primary that points to the new shard and is correct. We'd also have an entry for chain-old
that points at the old shard and is incorrect (ethereum_networks
there still has the entry for chain
), but that's tolerable since graph-node
should still start up though I am not 100% confident about that.
Unfortunately, this is really tricky, and testing all possible points where an interrupt (or error, e.g., because the db crashed) could happen is very hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the change to the transaction block (moving chain_store.update_name(&new_name)?
to after it) before merging. The other things are more suggestions.
node/src/manager/commands/chain.rs
Outdated
// Drop the foriegn key constraint on deployment_schemas | ||
|
||
// Fetch the current last_value from the sequence | ||
let result = sql_query("SELECT last_value FROM chains_id_seq").get_result::<ChainIdSeq>(&conn)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just do select nextval(chains_id_seq)
here instead of incrementing manually or even select 'chain' || nextval(chains_id_seq)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also be a bit cleaner to put this logic into a method fn block_store::primary::Chain::allocate_chain
that does basically what lines 188-197 do here and returns a Chain
. That way, you don't need a add_chain_store_inner
, and the logic is closer to the 'normal' flow where Chain::add_chain
gets called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't using next_value
wouldn't that increment the sequence too than not? I manually incremented because i don't want the increment in the sequence myself but just predict the next value.
) | ||
.execute(&conn)?; | ||
Ok(()) | ||
})?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I am trying to get at is that the conn.transaction
block puts changes in the primary into a transaction. If that gets interrupted, the transaction in the primary gets rolled back, but changes made in other shards will stick around. My concern then is that we could end up in a state where the primary still points at the chain in the old shard (in chains
), but we've already updated the chain name in that shard (in ethereum_networks
), and so when graph-node
starts up again, it won't find that chain since it first looks at the chains
table to figure out which shard the block cache is in, but can't find the corresponding entry in ethereum_networks
.
I think the only change that's needed from what you have right now is to move the call to chain_store.update_name(&new_name)?;
out of the transaction block. When the txn commits we have an entry for chain
in the primary that points to the new shard and is correct. We'd also have an entry for chain-old
that points at the old shard and is incorrect (ethereum_networks
there still has the entry for chain
), but that's tolerable since graph-node
should still start up though I am not 100% confident about that.
Unfortunately, this is really tricky, and testing all possible points where an interrupt (or error, e.g., because the db crashed) could happen is very hard.
@lutter ok to merge this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that's really good now!
Closes #5137