Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune unnecessary information from TransportNodesInfoAction.NodeInfoRequest #99938

Merged
merged 6 commits into from Oct 2, 2023

Conversation

NEUpanning
Copy link
Contributor

@NEUpanning NEUpanning commented Sep 27, 2023

Today every node-level TransportNodesInfoAction.NodeInfoRequest carries the entire top-level NodesInfoRequest, but it only needs to carry the metrics. The uneccessary information like nodes list incurs extra effort and network traffic.

This commit trims these things down. There are certainly other places (e.g. ,TransportNodesInfoAction.NodeStatsRequest ) where this pruning request would be valuable, but here we only focus on the TransportNodesInfoAction.NodeInfoRequest for the performance optimization of nodes info action.

Relates #99744

@elasticsearchmachine elasticsearchmachine added v8.11.0 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels Sep 27, 2023
@DaveCTurner DaveCTurner added >enhancement :Data Management/Stats Statistics tracking and retrieval APIs and removed needs:triage Requires assignment of a team area label labels Sep 27, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Sep 27, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@DaveCTurner
Copy link
Contributor

As it stands this doesn't pass ./gradlew precommit, and it fails the BWC test suite too (e.g. ./gradlew bwctest)

@DaveCTurner DaveCTurner self-assigned this Sep 27, 2023
@NEUpanning
Copy link
Contributor Author

Got it.I will make sure my code passes ./gradlew precommit.

@DaveCTurner
Copy link
Contributor

I like that we extract NodeInfoMetrics to a top-level class here, but that does make it hard to see the actual change in behaviour introduced by this PR. Could we do that refactoring first in a separate PR which doesn't change any behaviour and then follow up with the optimization?

@NEUpanning
Copy link
Contributor Author

Sure, i will do that refactoring first in a separate PR and then open this PR for the optimization.

@NEUpanning
Copy link
Contributor Author

I do that refactoring in a this PR (#99990 ). Could you take a look?

@NEUpanning
Copy link
Contributor Author

I have updated commit. Could you please take a look?

@DaveCTurner
Copy link
Contributor

Much better :) But it still fails the BwC test suite as I mentioned above.

@NEUpanning
Copy link
Contributor Author

Sorry, i forgot about that.I will make sure my code passes ./gradlew bwctest.

@NEUpanning
Copy link
Contributor Author

I can't even pass ./gradlew bwctest locally on the main branch. Errors like this
Caused by: java.lang.RuntimeException: Timed out after PT2M waiting for ports files for: { cluster: 'test-cluster', node: 'test-cluster-1' }
I pushed commit about backwards compatibility. Could we let elasticsearchmachine test this?

if (out.getTransportVersion().onOrAfter(NODE_INFO_REQUEST_VERSIONS_ADDED)) {
this.nodesInfoMetrics.writeTo(out);
} else {
this.request.writeTo(out);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would work to create a new NodesInfoRequest right here just for sending out to older nodes (with no node IDs, just the metrics). That way we wouldn't have to keep track of the request and the metrics separately, because that's going to be problematic if we ever tried to send one of these things over the wire twice with two different transport versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big thanks, such a great idea! This way we don't have to worry about backwards compatibility. I have force pushed commits to make them easier to review. fa7c890

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry - this looks good for the BwC path but it's kind of a workaround and we'd want to remove it as soon as possible (i.e. in v9.0). For future versions we should be sending only a NodesInfoMetrics as per your original commit.

Also please don't force-push to branches once they're being reviewed - as a reviewer it makes it much harder to keep track of what I have/haven't seen.

@DaveCTurner DaveCTurner changed the title Prune unnecessary information of TransportNodesInfoAction.NodeInfoReqest Prune unnecessary information from TransportNodesInfoAction.NodeInfoRequest Oct 1, 2023
@DaveCTurner
Copy link
Contributor

I pushed 7ccf073 adding a changelog entry for this PR - make sure to include that in future commits too please.

@NEUpanning
Copy link
Contributor Author

NEUpanning commented Oct 1, 2023

Got it. In the future i will add a changelog and not force-push.

@NEUpanning
Copy link
Contributor Author

For future versions i would like to do this work, at that time you can mention me.

Ah sorry - this looks good for the BwC path but it's kind of a workaround and we'd want to remove it as soon as possible (i.e. in v9.0). For future versions we should be sending only a NodesInfoMetrics as per your original commit.

@DaveCTurner
Copy link
Contributor

For future versions i would like to do this work, at that time you can mention me.

Thanks for the offer! If we get this PR right then preparing for the 9.0 release will just be a case of removing the checks on transport versions, which is easiest done across the whole codebase in one go so, to set expectations, I doubt we will ask for help with that.

@NEUpanning
Copy link
Contributor Author

I think i finally understand what you mean. We only track NodesInfoMetrics, for older nodes a NodesInfoMetrics is sent, for newer nodes a new NodesInfoRequest (with no node IDs) is sent and for future versions removing the operation of the older nodes.

@DaveCTurner
Copy link
Contributor

for older nodes a NodesInfoMetrics is sent, for newer nodes a new NodesInfoRequest (with no node IDs) is sent

I think you meant the other way around: to current/future versions we want to send a bare NodesInfoMetrics but older nodes require us to send a complete NodesInfoRequest (which we can just synthesize here if needed, omitting all the node IDs)

…ut older nodes require us to send a complete NodesInfoRequest
@NEUpanning
Copy link
Contributor Author

Yes, that's a clerical error. I pushed b058d43 .

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, one naming nit/suggestion and then I expect this is ready to go.

NEUpanning and others added 2 commits October 2, 2023 18:58
transport version naming

Co-authored-by: David Turner <david.turner@elastic.co>
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner
Copy link
Contributor

@elasticmachine test this please

@DaveCTurner DaveCTurner merged commit df52701 into elastic:main Oct 2, 2023
14 checks passed
@NEUpanning
Copy link
Contributor Author

Thanks for your review David :)

@DaveCTurner
Copy link
Contributor

You're welcome @NEUpanning , thanks for your contribution!

@NEUpanning
Copy link
Contributor Author

It's my pleasure and I learned a lot.

jakelandis pushed a commit to jakelandis/elasticsearch that referenced this pull request Oct 2, 2023
…equest (elastic#99938)

There's no need to include the whole top-level `NodesInfoRequest` in the
requests for info from individual nodes, and this can add substantial
overhead if there are lots of nodes in the cluster. With this commit we
drop the wrapper in favour of just the parts of the top-level request
needed for the node-level processing.

Relates elastic#99744
@NEUpanning
Copy link
Contributor Author

I would like to make this change to TransportNodesStatsAction.NodeStatsRequest in the same way, do you think I should do it?

@DaveCTurner
Copy link
Contributor

I would like to make this change to TransportNodesStatsAction.NodeStatsRequest in the same way, do you think I should do it?

Sounds good to me, yes. I think the same approach with 2 PRs will work fine there too.

@DaveCTurner
Copy link
Contributor

FWIW we have the same inefficiency in several other places too, I opened #100878 to list them all.

@NEUpanning
Copy link
Contributor Author

I would like to make this change to TransportNodesStatsAction.NodeStatsRequest in the same way

I did the first step, refatoring NodesStatsRequest in #100883

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Stats Statistics tracking and retrieval APIs >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Data Management Meta label for data/management team v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants