New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Eagerly evict objects that are no longer in scope #7220
[core] Eagerly evict objects that are no longer in scope #7220
Conversation
Can one of the admins verify this patch? |
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Might've preferred first having a version that doesn't batch at all but given that it's implemented this seems good.
@@ -312,6 +314,12 @@ void NodeManager::Heartbeat() { | |||
last_debug_dump_at_ms_ = now_ms; | |||
} | |||
|
|||
// Evict all copies of freed objects from the cluster. | |||
if (free_objects_period_ > 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this in the heartbeat means the flush duration needs to be much larger than or a multiple of the heartbeat duration - might be better to just have a separate timer on the event loop. Otherwise should indicate this in the comments in the config.
object_pinning_enabled is on by default now? Is there an api to add an object to the "list of objects to free" that will get evicted on the time or batch size threshold mentioned in this PR? thanks for the work on this. |
Yes, object_pinning_enabled is on by default, I believe as of the last release.
You could call |
Test FAILed. |
Test FAILed. |
TestMemoryScheduling::testTuneWorkerHeapLimit failure looks like it's from master. |
Why are these changes needed?
Add an option to eagerly evict copies of objects that are no longer in scope, according to the owner of the object, to reduce plasma's memory footprint. The eviction is done by the raylets. When the raylet that is pinning the object ID hears from the owner that it is OK to unpin, the raylet adds the object ID to a list of objects to free. Once the list reaches a configured size or has not been flushed after a configured time, the list will be flushed by sending a
FreeObjects
request to all other object managers. This in turn triggers aDelete
of the object in plasma.This PR leaves the feature off by default. To enable it, make sure
object_pinning_enabled
is on, then setfree_objects_period_milliseconds
to a non-negative value in the backend config. This will set the time period between attempts to free objects.free_objects_batch_size
sets the maximum size before the list is flushed. Note that if the application uses serializedObjectID
s, i.e. an object ID that is created in one process and another process is given a reference to it, then it is recommended thatdistributed_ref_counting_enabled
is also turned on, or else the application may receive spurious "object lost" errors.Example to enable:
Checks
scripts/format.sh
to lint the changes in this PR.