Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More Compact Serialization of Metadata #82608

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Jan 14, 2022

Serialize the map of hashes to mappings and then lookup from the map instead
of serializing them over and over for each index to make full cluster state
transport messages much smaller in the common case of many duplicate mappings.

This should make the master node impact of requests for the full cluster state (or at least the state including mappings) quite a bit cheaper memory+cpu+network wise. Also it saves lots of buffers on the coordinating/sending node as well as CPU for deduplicating mappings.

relates #77466

Serialize the map of hashes to mappings and then lookup from the map instead
of serializing them over and over for each index to make full cluster state
transport messages much smaller in the common case of many duplicate mappings.
@original-brownbear original-brownbear added >enhancement :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.1.0 labels Jan 14, 2022
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Jan 14, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

if (in.getVersion().onOrAfter(MAPPINGS_AS_HASH_VERSION)) {
final int mappings = in.readVInt();
if (mappings > 0) {
final Map<String, MappingMetadata> mappingMetadataMap = new HashMap<>(mappings);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HashMap constructors accepts the capacity, not the expected amount of elements. It needs to be sized a bit higher than mappings, otherwise it will need to be resized/rehashed.

See https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/Maps.java#L273

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, though I guess it might be worthwhile to have a general fix to this. We seem to always pre-size capacity == element count in deserialization. Technically, we probably could move to accounting for the load factor, but I wouldn't expect too much from it (especially when the key's hashcode is essentially free).

@original-brownbear
Copy link
Member Author

Thanks Ievgen!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement Team:Distributed Meta label for distributed team v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants