Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add graphman command to change block cache shard #5169

Merged
merged 3 commits into from
Mar 7, 2024

Conversation

incrypto32
Copy link
Contributor

Closes #5137

@fordN fordN requested a review from lutter January 29, 2024 16:47
node/src/manager/commands/chain.rs Outdated Show resolved Hide resolved
node/src/manager/commands/chain.rs Outdated Show resolved Hide resolved
)
.execute(&conn)?;
Ok(())
})?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I am wondering about is what happens when this block gets interrupted (e.g., because the user presses ^C) Since there are three shards involved, it may leave everything in a messy state, possibly in a state where graph-node won't start again. I think it's right to use one transaction for the primary, at least that way, we won't lose the constraint.

Maybe this order of operations is safer:

  1. Add the new chain store to the new shard (no change to primary yet) It's a little sucky since you can't use the add_chain helper for that as that modifies the primary
  2. Update the chains table in the primary (in one txn: update chains entry to -old, create new entry for chain name pointing to new shard)
  3. Update the name in the old shard to -old

I am not 100% sure that that leaves everything at least in a state where graph-node will start again, but I think it's close.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not entirely sure of what you suggested, i've made some changes now, could you take a look.

Changed the order, but had to get the next value for the chains_id_seq when doing this before calling add_chain

conn.transaction(|| -> Result<(), StoreError> {
let ident = chain_store.chain_identifier.clone();
let shard = Shard::new(shard.to_string())?;
// Fetch the current last_value from the sequence
let result = sql_query("SELECT last_value FROM chains_id_seq").get_result::<ChainIdSeq>(&conn)?;
let last_val = result.last_value;
let next_val = last_val + 1;
let namespace = format!("chain{}", next_val);
let storage= Storage::new(namespace.to_string()).map_err(|e| {
anyhow!("Failed to create storage: {}", e)
})?;
store.add_chain_store_inner(&chain_name, &shard, &ident, storage,ChainStatus::Ingestible, true)?;
// Drop the foreign key constraint on deployment_schemas
sql_query(
"alter table deployment_schemas drop constraint deployment_schemas_network_fkey;",
)
.execute(&conn)?;
// Update the current chain name to chain-old
update_chain_name(&conn, &chain_name, &new_name)?;
chain_store.update_name(&new_name)?;
// Create a new chain with the name in the destination shard
let _= add_chain(&conn, &chain_name, &ident, &shard)?;
// Re-add the foreign key constraint
sql_query(
"alter table deployment_schemas add constraint deployment_schemas_network_fkey foreign key (network) references chains(name);",
)
.execute(&conn)?;
Ok(())
})?;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I am trying to get at is that the conn.transaction block puts changes in the primary into a transaction. If that gets interrupted, the transaction in the primary gets rolled back, but changes made in other shards will stick around. My concern then is that we could end up in a state where the primary still points at the chain in the old shard (in chains), but we've already updated the chain name in that shard (in ethereum_networks), and so when graph-node starts up again, it won't find that chain since it first looks at the chains table to figure out which shard the block cache is in, but can't find the corresponding entry in ethereum_networks.

I think the only change that's needed from what you have right now is to move the call to chain_store.update_name(&new_name)?; out of the transaction block. When the txn commits we have an entry for chain in the primary that points to the new shard and is correct. We'd also have an entry for chain-old that points at the old shard and is incorrect (ethereum_networks there still has the entry for chain), but that's tolerable since graph-node should still start up though I am not 100% confident about that.

Unfortunately, this is really tricky, and testing all possible points where an interrupt (or error, e.g., because the db crashed) could happen is very hard.

Copy link
Collaborator

@lutter lutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the change to the transaction block (moving chain_store.update_name(&new_name)? to after it) before merging. The other things are more suggestions.

// Drop the foriegn key constraint on deployment_schemas

// Fetch the current last_value from the sequence
let result = sql_query("SELECT last_value FROM chains_id_seq").get_result::<ChainIdSeq>(&conn)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just do select nextval(chains_id_seq) here instead of incrementing manually or even select 'chain' || nextval(chains_id_seq)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be a bit cleaner to put this logic into a method fn block_store::primary::Chain::allocate_chain that does basically what lines 188-197 do here and returns a Chain. That way, you don't need a add_chain_store_inner, and the logic is closer to the 'normal' flow where Chain::add_chain gets called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't using next_value wouldn't that increment the sequence too than not? I manually incremented because i don't want the increment in the sequence myself but just predict the next value.

)
.execute(&conn)?;
Ok(())
})?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I am trying to get at is that the conn.transaction block puts changes in the primary into a transaction. If that gets interrupted, the transaction in the primary gets rolled back, but changes made in other shards will stick around. My concern then is that we could end up in a state where the primary still points at the chain in the old shard (in chains), but we've already updated the chain name in that shard (in ethereum_networks), and so when graph-node starts up again, it won't find that chain since it first looks at the chains table to figure out which shard the block cache is in, but can't find the corresponding entry in ethereum_networks.

I think the only change that's needed from what you have right now is to move the call to chain_store.update_name(&new_name)?; out of the transaction block. When the txn commits we have an entry for chain in the primary that points to the new shard and is correct. We'd also have an entry for chain-old that points at the old shard and is incorrect (ethereum_networks there still has the entry for chain), but that's tolerable since graph-node should still start up though I am not 100% confident about that.

Unfortunately, this is really tricky, and testing all possible points where an interrupt (or error, e.g., because the db crashed) could happen is very hard.

@incrypto32
Copy link
Contributor Author

@lutter ok to merge this?

Copy link
Collaborator

@lutter lutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's really good now!

@incrypto32 incrypto32 merged commit cf5997d into master Mar 7, 2024
7 checks passed
@incrypto32 incrypto32 deleted the krishna/block-cache branch March 7, 2024 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Allow for changing the block cache shard
2 participants