-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Trigger global gc when plasma store is under pressure. #15775
Conversation
Why don't we implement this in |
@@ -313,6 +313,10 @@ class ObjectManager : public ObjectManagerInterface, | |||
|
|||
int64_t GetMemoryCapacity() const { return config_.object_store_memory; } | |||
|
|||
double GetUsedMemoryPercentage() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we already have something like this in the resource manager that you could reuse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephanie-wang I only see resource manager in gcs, which probably doesn't fit here. Is this the one you mentioned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of the ClusterResourceScheduler, although I'm not totally sure it's there. @ericl should know since I think he added the memory availability reporting for the autoscaler.
@@ -519,20 +519,27 @@ void NodeManager::FillResourceReport(rpc::ResourcesData &resources_data) { | |||
cluster_resource_scheduler_->FillResourceUsage(resources_data); | |||
cluster_task_manager_->FillResourceUsage(resources_data); | |||
|
|||
// If plasma store is under high pressure, we should try to schedule a global gc. | |||
bool plasma_high_pressure = | |||
object_manager_.GetUsedMemoryPercentage() > high_plasma_storage_usage_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we adjust the interval based on the current memory usage instead of using a static threshold/interval?
Also, what are the guarantees for when this method gets called? Is it on a timer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I've talked with @ericl about this, the conclusion is that the 30s probably should be the minimum duration to avoid overhead to the cluster. The default GC on the local node is 10min here. And here it's to issue a global gc which is different from that one.
Besides, we think maybe the global GC should be sent from GCS since that's the place that can have better global control. For example, if all nodes are at a similar usage of memory, global GC will be triggered at the same time, and it's N*N message passing which is not necessary. Although it goes with the heartbeat, the heartbeat will be forced to be broadcasted if global_gc bit is set to be true.
This goes with report, it's run periodically (code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, thanks for explaining!
Yeah for the report, I was just worried we might miss the GC call because this handler doesn't get called soon enough. This seems fine, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the gcs based report idea. Can you create a enhancement issue? Would there be any issue with that approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, thanks for explaining!
Yeah for the report, I was just worried we might miss the GC call because this handler doesn't get called soon enough. This seems fine, though.
I agree with you. I think here if memory usage is close to 100%, we probably can do nothing. It'll help if it bump to 90% suddenly and increase slowly. That's why even with this, it won't help fix the issue.
This is more like a higher-level logic I think. It's a little bit strange to put global GC logic into plasma store which is a single node in-memory storage. I actually prefer moving that triggering gc out from the store. Another reason is that the ProcessRequests are something processing plasma query, and if there is no query, it won't trigger it (I guess :p). Think about this case, the plasma store is under high pressure and all things are being used. GC got triggered, and it doesn't help. After this, some resources released in the app layer, before app gc's scheduling, there is no traffic running to plasma store which will keep the store under high memory pressure. So I think this is the better place for it. |
Also, is it possible to write a Python test for this? |
Sure. I actually was thinking about how to add a unit test and then realized it's not that easy to do. python e2e test totally makes sense. |
src/ray/raylet/node_manager.cc
Outdated
auto now = absl::GetCurrentTimeNanos(); | ||
if ((should_local_gc_ || now - last_local_gc_ns_ > local_gc_interval_ns_) && | ||
now - last_local_gc_ns_ > local_gc_min_interval_ns_) { | ||
if ((should_local_gc_ || (absl::GetCurrentTimeNanos() - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need absl::GetCurrentTimeNanos() - local_gc_throttler_.LastRunTime()
here? Doesn't AbleToRun have the same logic in it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two things here: 1) prevent it from running if it run too frequently. 2) force it to run if it hasn't run for a while. AbleToRun cover 1), 2) is not covered.
|
||
#include "ray/util/throttler.h" | ||
|
||
TEST(ThrottlerTest, BasicTest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means the test takes 20 seconds right? Can we just use the logical timer like PullManager
's get_time
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check it
src/ray/util/throttler.h
Outdated
class Throttler { | ||
public: | ||
explicit Throttler(uint64_t interval_ns) | ||
: last_run_ns_(absl::GetCurrentTimeNanos()), interval_ns_(interval_ns) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we set last_run_ns_ to zero by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This allows the first trigger to happen immediately)
Can you rebase to fix the CI issues? |
(07:04:40) ERROR: BUILD.bazel:969:8: Couldn't build file _objs/throttler_test/throttler_test.obj: C++ compilation of rule '//:throttler_test' failed (Exit 2): cl.exe failed: error executing command Failing with Windows build with this error |
Thanks for the reminder! Sorry too busy with other things and totally forgot this :( I checked the failure of the window and I think it's an important check and I'm surprised that Linux doesn't give such an error. |
|
Why are these changes needed?
To avoid object spilling when the object is full, we trigger the global gc earlier to release memory.
Related issue number
Closes #15550
Checks
scripts/format.sh
to lint the changes in this PR.