Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster: small performance optimisations #780

Merged
merged 7 commits into from Aug 22, 2023

Conversation

wprzytula
Copy link
Collaborator

This is a bunch of performance-related small optimisation of the cluster module, mainly focused on avoiding cloning where possible.

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • [ ] I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • [ ] I have provided docstrings for the public items that I want to introduce.
  • [ ] I have adjusted the documentation in ./docs/source/.
  • [ ] I added appropriate Fixes: annotations to PR description.

@wprzytula wprzytula added the performance Improves performance of existing features label Jul 31, 2023
@wprzytula wprzytula requested review from piodul and havaker July 31, 2023 13:23
@wprzytula wprzytula force-pushed the cluster-small-optimisations branch from a27b28f to 095c6fb Compare July 31, 2023 13:24
Keyspace strategies were cloned and allocated into a Vec, just for
creating an iterator to references to them in order to construct a
`ReplicaLocator`. It was done due to lifetime issues (in order to send
references to another thread, they must be 'static).
The problem is solved by sending keyspaces to another thread for
constructing `ReplicaLocator` and then moving it back to the calling
thread afterwards. Clones are elided. Hooray!
By leveraging branched variable initialisation, cloning datacenters and
racks is no longer necessary.
This one wasn't particularly hard.
There no reason why the method should be async.
@wprzytula
Copy link
Collaborator Author

v2: rebased on main.

scylla/src/transport/cluster.rs Show resolved Hide resolved
scylla/src/transport/session.rs Outdated Show resolved Hide resolved
scylla/src/transport/cluster.rs Show resolved Hide resolved
There was a convoluted logic that can be easily expressed using
itertools' `find_or_first()` iterator method.
.map(|node| node.get_working_connections())
.flatten_ok()
// By an invariant `self.known_peers` is nonempty, so the returned iterator
// is nonempty, too.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that? If all node pools are in MaybePoolConnections::Broken state, every call to node.get_working_connections() will fail.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, and hence the resulting iterator will consist of only Err variants.
flatten_ok() does not remove any Err variants; what it does is flattening the Ok variants; that is, Ok(Ok(x)) becomes Ok(x).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a quite large change in semantics. Previously, if all nodes had a broken connection pool then the function returned an error; otherwise you would get a list of current connections. Now, you are putting the responsibility for handling errors to the caller. Despite this change the code seems to be working due to how the current callers are handling the errors from the iterator.

I'd rather see the old semantics, i.e. the function should either fail immediately if there are no connections or return an iterator that returns just Arc<Connection>. Alternatively (but not preferably) you could document the meaning of what the iterator really returns.

scylla/src/transport/session.rs Outdated Show resolved Hide resolved
scylla/src/transport/session.rs Show resolved Hide resolved
Copy link
Contributor

@havaker havaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a nit, LGTM

scylla/src/transport/cluster.rs Show resolved Hide resolved
Copy link
Collaborator

@piodul piodul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One request, LGTM otherwise.

Comment on lines 1874 to 1879
let handles = connections_iter.map(|c| async {
match c {
Ok(c) => c.fetch_schema_version().await,
Err(err) => Err(err),
}
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would and_then instead of map work? Similarly above, lines 868-873.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably mean "instead of match". Then I don't think so, because the operation is done inside an async block, which is a known limitation of functional idioms in Rust.

.map(|node| node.get_working_connections())
.flatten_ok()
// By an invariant `self.known_peers` is nonempty, so the returned iterator
// is nonempty, too.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a quite large change in semantics. Previously, if all nodes had a broken connection pool then the function returned an error; otherwise you would get a list of current connections. Now, you are putting the responsibility for handling errors to the caller. Despite this change the code seems to be working due to how the current callers are handling the errors from the iterator.

I'd rather see the old semantics, i.e. the function should either fail immediately if there are no connections or return an iterator that returns just Arc<Connection>. Alternatively (but not preferably) you could document the meaning of what the iterator really returns.

@wprzytula
Copy link
Collaborator Author

Reworked iter_working_connections() so that now it returns either an error or an iterator with successful working connections (with at least one of them).

`get_working_connections()` not only allocated `Vec`s for all nodes, but
it even cloned them to one big `Vec`. At least the latter could be
avoided by returning an iterator over connections.
The new replacement method, `iter_working_connections()`, was put on
`ClusterData` instead of `Cluster`, because it borrows from
`ClusterData`, so lifetime issues would be brought otherwise.
The two uses of `get_working_connections()`, `Session::prepare()` and
`Session::check_schema_agreement()`, were updated to use
`iter_working_connections()`, and `get_working_connections()` was
deleted as not needed anymore.
As `join_all()` accepts any `IntoIterator`, it accepts iterators in
particular. Allocating a `Vec` is unnecessary.
@piodul piodul merged commit decac4e into scylladb:main Aug 22, 2023
8 checks passed
@wprzytula wprzytula deleted the cluster-small-optimisations branch August 22, 2023 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Improves performance of existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants