Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repair: high latency of RBNO bootstrap #15505

Closed
Deexie opened this issue Sep 21, 2023 · 11 comments
Closed

repair: high latency of RBNO bootstrap #15505

Deexie opened this issue Sep 21, 2023 · 11 comments
Assignees
Milestone

Comments

@Deexie
Copy link
Contributor

Deexie commented Sep 21, 2023

Sometimes when a new node is bootstrapped with RBNO enabled,
the state change indicating bootstrap finish is delayed. Thus, tests
wait idly until a server is added, even though the logs show that
the bootstrap's already finished.

@Deexie Deexie self-assigned this Sep 21, 2023
Deexie added a commit to Deexie/scylla that referenced this issue Sep 21, 2023
Before integration with task manager the state of one shard repair
was kept in repair_info. repair_info object was destroyed immediately
after shard repair was finished.

In an integration process repair_info's fields were moved to
shard_repair_task_impl as the two served the similar purposes.
Though, shard_repair_task_impl isn't immediately destoyed, but is
kept in task manager for task_ttl seconds after it's complete.
Thus, some of repair_info's fields have their lifetime prolonged,
which makes the repair state change delayed.

Release shard_repair_task_impl resources immediately after shard
repair is finished.

Fixes: scylladb#15505.
@mykaul mykaul added this to the 5.4 milestone Sep 21, 2023
@mykaul mykaul added Backport candidate backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed labels Sep 21, 2023
Deexie added a commit to Deexie/scylla that referenced this issue Sep 21, 2023
Before integration with task manager the state of one shard repair
was kept in repair_info. repair_info object was destroyed immediately
after shard repair was finished.

In an integration process repair_info's fields were moved to
shard_repair_task_impl as the two served the similar purposes.
Though, shard_repair_task_impl isn't immediately destoyed, but is
kept in task manager for task_ttl seconds after it's complete.
Thus, some of repair_info's fields have their lifetime prolonged,
which makes the repair state change delayed.

Release shard_repair_task_impl resources immediately after shard
repair is finished.

Fixes: scylladb#15505.
Deexie added a commit to Deexie/scylla that referenced this issue Sep 22, 2023
Before integration with task manager the state of one shard repair
was kept in repair_info. repair_info object was destroyed immediately
after shard repair was finished.

In an integration process repair_info's fields were moved to
shard_repair_task_impl as the two served the similar purposes.
Though, shard_repair_task_impl isn't immediately destoyed, but is
kept in task manager for task_ttl seconds after it's complete.
Thus, some of repair_info's fields have their lifetime prolonged,
which makes the repair state change delayed.

Release shard_repair_task_impl resources immediately after shard
repair is finished.

Fixes: scylladb#15505.
Deexie added a commit to Deexie/scylla that referenced this issue Sep 25, 2023
Before integration with task manager the state of one shard repair
was kept in repair_info. repair_info object was destroyed immediately
after shard repair was finished.

In an integration process repair_info's fields were moved to
shard_repair_task_impl as the two served the similar purposes.
Though, shard_repair_task_impl isn't immediately destoyed, but is
kept in task manager for task_ttl seconds after it's complete.
Thus, some of repair_info's fields have their lifetime prolonged,
which makes the repair state change delayed.

Release shard_repair_task_impl resources immediately after shard
repair is finished.

Fixes: scylladb#15505.
@avikivity
Copy link
Member

@Deexie does this merit a backport, or does it only impact tests? in any case I think the code is new enough that 5.2 doesn't have the problem.

@Deexie
Copy link
Contributor Author

Deexie commented Oct 28, 2023

@Deexie does this merit a backport, or does it only impact tests? in any case I think the code is new enough that 5.2 doesn't have the problem.

It impacts users as well. And 5.2 contains shard_repair_task_impl, so I think it's worth backporting

@denesb
Copy link
Contributor

denesb commented Oct 30, 2023

@Deexie does this merit a backport, or does it only impact tests? in any case I think the code is new enough that 5.2 doesn't have the problem.

It impacts users as well. And 5.2 contains shard_repair_task_impl, so I think it's worth backporting

There are many conflicts, @Deexie please prepare a backport PR.

@Deexie
Copy link
Contributor Author

Deexie commented Oct 30, 2023

@Deexie does this merit a backport, or does it only impact tests? in any case I think the code is new enough that 5.2 doesn't have the problem.

It impacts users as well. And 5.2 contains shard_repair_task_impl, so I think it's worth backporting

There are many conflicts, @Deexie please prepare a backport PR.

The change for that modifies what's added in #15005 and related issue #14966 is not backported. Of course I can skip that part (and in this case that would be really easy). But the thing is that the changes are pretty related and #14966 seems to be much more harmful than this one. @denesb what do you think about opening a joint backport PR to branch-5.2?

@denesb
Copy link
Contributor

denesb commented Oct 30, 2023

The change for that modifies what's added in #15005 and related issue #14966 is not backported. Of course I can skip that part (and in this case that would be really easy). But the thing is that the changes are pretty related and #14966 seems to be much more harmful than this one. @denesb what do you think about opening a joint backport PR to branch-5.2?

Yes please.

Deexie added a commit to Deexie/scylla that referenced this issue Oct 30, 2023
Before integration with task manager the state of one shard repair
was kept in repair_info. repair_info object was destroyed immediately
after shard repair was finished.

In an integration process repair_info's fields were moved to
shard_repair_task_impl as the two served the similar purposes.
Though, shard_repair_task_impl isn't immediately destoyed, but is
kept in task manager for task_ttl seconds after it's complete.
Thus, some of repair_info's fields have their lifetime prolonged,
which makes the repair state change delayed.

Release shard_repair_task_impl resources immediately after shard
repair is finished.

Fixes: scylladb#15505.
(cherry picked from commit 0474e15)
@Deexie
Copy link
Contributor Author

Deexie commented Oct 30, 2023

Oh, nevermind, I checked for compaction tasks in the middle of cherry-pick... 5.2 does not contain #14966, sorry. I opened #15875 with this change only

@asias
Copy link
Contributor

asias commented Nov 6, 2023

Sometimes when a new node is bootstrapped with RBNO enabled, the state change indicating bootstrap finish is delayed. Thus, tests wait idly until a server is added, even though the logs show that the bootstrap's already finished.

It is not clear to me what is the problem according the the description above. What "state changing" are you referring to? I first thought it was about cql read / write latency.

@Deexie
Copy link
Contributor Author

Deexie commented Nov 6, 2023

effective_replication_map_ptr is kept in memory longer than it used to. From a comment to erm class:

Holding to this object keeps the associated token_metadata_ptr alive, keeping the token metadata version alive and seen as in use.

Deexie added a commit to Deexie/scylla that referenced this issue Nov 6, 2023
Before integration with task manager the state of one shard repair
was kept in repair_info. repair_info object was destroyed immediately
after shard repair was finished.

In an integration process repair_info's fields were moved to
shard_repair_task_impl as the two served the similar purposes.
Though, shard_repair_task_impl isn't immediately destoyed, but is
kept in task manager for task_ttl seconds after it's complete.
Thus, some of repair_info's fields have their lifetime prolonged,
which makes the repair state change delayed.

Release shard_repair_task_impl resources immediately after shard
repair is finished.

Fixes: scylladb#15505.
(cherry picked from commit 0474e15)
@asias
Copy link
Contributor

asias commented Nov 7, 2023

effective_replication_map_ptr is kept in memory longer than it used to. From a comment to erm class:

Holding to this object keeps the associated token_metadata_ptr alive, keeping the token metadata version alive and seen as in use.

Yes, it helps to release the effective_replication_map_ptr earlier. But do you see any actual problem, e.g., taking longer to finish repair, user facing higher read / write cql latency?

You mentioned in the opening message:

Thus, tests wait idly until a server is added, even though the logs show that
the bootstrap's already finished.

Not releasing effective_replication_map_ptr should not have any impact on when the bootstrap should be considered as finished.

denesb pushed a commit that referenced this issue Nov 7, 2023
Before integration with task manager the state of one shard repair
was kept in repair_info. repair_info object was destroyed immediately
after shard repair was finished.

In an integration process repair_info's fields were moved to
shard_repair_task_impl as the two served the similar purposes.
Though, shard_repair_task_impl isn't immediately destoyed, but is
kept in task manager for task_ttl seconds after it's complete.
Thus, some of repair_info's fields have their lifetime prolonged,
which makes the repair state change delayed.

Release shard_repair_task_impl resources immediately after shard
repair is finished.

Fixes: #15505.
(cherry picked from commit 0474e15)

Closes #15875
@Deexie
Copy link
Contributor Author

Deexie commented Nov 7, 2023

effective_replication_map_ptr is kept in memory longer than it used to. From a comment to erm class:

Holding to this object keeps the associated token_metadata_ptr alive, keeping the token metadata version alive and seen as in use.

Yes, it helps to release the effective_replication_map_ptr earlier. But do you see any actual problem, e.g., taking longer to finish repair, user facing higher read / write cql latency?

You mentioned in the opening message:

Thus, tests wait idly until a server is added, even though the logs show that
the bootstrap's already finished.

Not releasing effective_replication_map_ptr should not have any impact on when the bootstrap should be considered as finished.

Yes, I saw the problem while writing a test in test/topology. It was a test related to task manager so I set task_ttl to some really big number and while 2-node cluster was created immediately, it took task_ttl for 3-node cluster.

I didn't really check the actual path, rather focused on symptoms. So I do not know how exactly alive erm causes ManagerClient::server_add to hang, but I can find it if that's important.

@mykaul
Copy link
Contributor

mykaul commented Dec 13, 2023

Backport to 5.2 is via #15875 - what else remains here?

@mykaul mykaul removed Backport candidate backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed labels Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants