Cache Node Assignment Service #958

kx-chen · 2024-06-20T20:37:38Z

Summary

This PR adds a basic draft of the cache node assignment service + tests. Several other changes were added in order to support the service, they include:

Adding CacheNodeAssignment and CacheNodeMetadata classes (plus serializers, ZK store classes, proto file changes)
Having the CachingChunkManager listen for changes in assignments and download data as appropriate

Requirements

I've read and understood the Contributing Guidelines and have done my best effort to follow them.
I've read and agree to the Code of Conduct.

mansu · 2024-06-21T00:04:01Z

We already have code that does cache node assignment. How is this different from current logic?

It seems upon closer reading of the code, we are adding dynamic chunks. It would be good to clarify the design intent here.

If we want to support more chunks per cache node, why not increase the number of slots dynamically? Why do we need to change chunk assignment?

kx-chen · 2024-06-21T17:47:13Z

Hi Suman! Great to see you. Good question!

Yes, we are adding dynamic chunks. However, the goal is not just to support more chunks per cache node but also to improve disk space utilization.

I'm sure you already know, but chunks are not always a fixed size. While we aim for uniform size, they can vary. Since slots assume each chunk is the same size, this can lead to under-utilization of disk space if chunks are smaller than the assumed size.

For example, with a 3TB disk and 15GB per slot, we get 200 slots. If a chunk is only 10GB, there’s still 5GB of unused space allocated for that slot.

The change in assignment algorithm addresses this. We plan to use a bin-packing algorithm to pack all chunks into the least number of cache nodes, optimizing disk space usage and reducing costs.

Let me know if that answers your question! And apologies for the lack of context in the PR—it's still very much a work in progress 😅

mansu · 2024-06-28T07:05:23Z

Hi Kai, great to see you again!

Thanks for the context. It would ideal to write a design doc or a discussion for this before we start coding to document the goals of this change.

It seems that the goal of this change is improve disk utilization. This is a great goal. I am however curious why the there is so much wasted disk space unused in practice. If you are using size based roll over, at any given time only a small percentage of chunks would not be the same size as the cache slot but in steady state the chunks should be of the same size. If it is happening regularly, is it not a tuning issue? Also, even if the temporarily pack smaller chunks better(say after a deployment), would we not get back to a the same configuration as a fixed slot config once the roll over happens in an hour? I am asking since I may be missing a case where there are longer term advantages of this change.

Also, the current implementation of static number of slots results employs a constant work paradigm for the entire cluster. This makes cluster operations mechanical, predictable and easy to automate.

The dynamic slot model proposed here is much more dynamic and makes the system less predictable over all. This is big sacrifice from a design perspective.

To carefully evaluate the effect of these changes, it may be better to break this change down into a few smaller changes:

The most basic feature needed here is ability to create dynamic slots on cache nodes instead of fixed slots.
Once we can dynamically change number of slots, the next feature to consider is if we should allow nodes other than cache node(ex cluster manager) to create the slots.
There may be a lot of ways in which we can optimize the cache assignment algorithm. So, we should probably make the assignment algorithm change independent of these dynamic nodes.

It is possible that we can get back disk utilization after the first 2 changes. It seems we are also changing the cache assignment nodes in this PR. I think that change may be better done independent of this change.

Overall, I think it may be ideal if we approach this change in a more incremental manner.

.run/Astra_cache.run.xml

astra/src/main/java/com/slack/astra/chunk/ReadOnlyChunkImpl.java

astra/src/test/java/com/slack/astra/chunk/ReadOnlyChunkImplTest.java

astra/src/test/java/com/slack/astra/chunkManager/CachingChunkManagerTest.java

astra/src/test/java/com/slack/astra/metadata/cache/CacheNodeAssignmentSerializerTest.java

astra/src/test/java/com/slack/astra/metadata/cache/CacheNodeAssignmentStoreTest.java

astra/src/test/java/com/slack/astra/metadata/cache/CacheNodeAssignmentTest.java

astra/src/test/java/com/slack/astra/metadata/cache/CacheNodeMetadataStoreTest.java

astra/src/main/java/com/slack/astra/clusterManager/CacheNodeAssignmentService.java

bryanlb

Looking pretty good! This is a pretty huge change, but you've done a good job of isolating the new behavior behind a "default false" system property.

I took a look over the code coverage as well, and it looks like you did a decent job of exercising the new paths. I think with a couple small tweaks this is probably good to merge up.

astra/src/main/java/com/slack/astra/chunkManager/CachingChunkManager.java

astra/src/main/java/com/slack/astra/chunk/ReadOnlyChunkImpl.java

astra/src/main/java/com/slack/astra/chunkManager/CachingChunkManager.java

astra/src/main/java/com/slack/astra/clusterManager/CacheNodeAssignmentService.java

astra/src/main/java/com/slack/astra/clusterManager/ClusterHpaMetricService.java

astra/src/main/java/com/slack/astra/clusterManager/ClusterMonitorService.java

config/config.yaml

Includes tests, serializers, proto file changes

Includes changes to ReadOnlyChunkImpl

bryanlb

Couple nits, but I would consider addressing them in a future PR. I'm not seeing anything that sticks outs that we haven't discussed, and you have already successfully deployed this to several test environments.

bryanlb · 2024-07-26T15:54:41Z

astra/src/main/java/com/slack/astra/chunk/ReadOnlyChunkImpl.java

@@ -128,6 +171,142 @@ public ReadOnlyChunkImpl(
    LOG.debug("Created a new read only chunk - zkSlotId: {}", slotId);
  }

+  /*
+  ======================================================


Nit: These types of comments tend to go stale pretty quick, as folks refactor and miss a comment block several methods up when they're adding new methods.

bryanlb · 2024-07-26T15:55:34Z

astra/src/main/java/com/slack/astra/chunk/ReadOnlyChunkImpl.java

+      snapshotMetadataStore.close();
+      searchMetadataStore.close();
+
+      LOG.debug("Closed chunk");


Nit: This would be more useful if it contained information about what chunk was closed, otherwise this isn't much different than a metric.

salesforce-cla bot added the cla:signed label Jun 20, 2024

kx-chen force-pushed the kx-chen/cache-node-assignment branch 3 times, most recently from 75e651e to da142b3 Compare June 20, 2024 21:44

kx-chen force-pushed the kx-chen/cache-node-assignment branch from da142b3 to 44f1592 Compare July 1, 2024 22:53

kx-chen requested a review from bryanlb July 1, 2024 22:54

kx-chen force-pushed the kx-chen/cache-node-assignment branch 3 times, most recently from 99baf84 to 7c90558 Compare July 2, 2024 23:06

bryanlb reviewed Jul 12, 2024

View reviewed changes

kx-chen force-pushed the kx-chen/cache-node-assignment branch 5 times, most recently from 75e8c77 to 5e26cf0 Compare July 22, 2024 22:06

kx-chen requested a review from bryanlb July 22, 2024 22:22

kx-chen marked this pull request as ready for review July 22, 2024 22:22

kx-chen changed the title ~~[DRAFT] Cache Node Assignment Service~~ Cache Node Assignment Service Jul 22, 2024

bryanlb reviewed Jul 24, 2024

View reviewed changes

kx-chen force-pushed the kx-chen/cache-node-assignment branch 4 times, most recently from 1d2075c to a660974 Compare July 25, 2024 16:27

kx-chen requested a review from bryanlb July 25, 2024 16:28

kx-chen force-pushed the kx-chen/cache-node-assignment branch 3 times, most recently from ace315d to 01852ca Compare July 25, 2024 23:11

kx-chen force-pushed the kx-chen/cache-node-assignment branch 2 times, most recently from ad99d00 to 9a17959 Compare July 25, 2024 23:48

kx-chen added 4 commits July 25, 2024 16:58

Add CacheNodeAssignment, CacheNodeMetadata classes

f2787b2

Includes tests, serializers, proto file changes

CachingChunkManager: Listen for changes in assignments & download data

9a517e4

Includes changes to ReadOnlyChunkImpl

Add CacheNodeAssignmentService

b845c4e

progress

6c0c33e

kx-chen force-pushed the kx-chen/cache-node-assignment branch from 9a17959 to 6c0c33e Compare July 25, 2024 23:58

bryanlb approved these changes Jul 26, 2024

View reviewed changes

kx-chen merged commit f662fbd into slackhq:master Jul 26, 2024
2 checks passed

bryanlb mentioned this pull request Aug 14, 2024

Recovery tasks can create excessively large chunks, resulting in out of disk #663

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache Node Assignment Service #958

Cache Node Assignment Service #958

kx-chen commented Jun 20, 2024

mansu commented Jun 21, 2024 •

edited

Loading

kx-chen commented Jun 21, 2024 •

edited

Loading

mansu commented Jun 28, 2024

bryanlb left a comment

bryanlb left a comment

bryanlb Jul 26, 2024

bryanlb Jul 26, 2024

Cache Node Assignment Service #958

Cache Node Assignment Service #958

Conversation

kx-chen commented Jun 20, 2024

Summary

Requirements

mansu commented Jun 21, 2024 • edited Loading

kx-chen commented Jun 21, 2024 • edited Loading

mansu commented Jun 28, 2024

bryanlb left a comment

Choose a reason for hiding this comment

bryanlb left a comment

Choose a reason for hiding this comment

bryanlb Jul 26, 2024

Choose a reason for hiding this comment

bryanlb Jul 26, 2024

Choose a reason for hiding this comment

mansu commented Jun 21, 2024 •

edited

Loading

kx-chen commented Jun 21, 2024 •

edited

Loading