New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable distributed ref counting by default #7628
Enable distributed ref counting by default #7628
Conversation
Can we also enable eager eviction at the same time? |
Can one of the admins verify this patch? |
Test PASSed. |
Test PASSed. |
Note that I had to add a DrainAndShutdown method to the ReferenceCounter, similar to the one for TaskManager. This is to make sure that if the worker is doing a clean exit and still has some objects that it owns, that it will wait for their refs to go out of scope before exiting. This became necessary because eager eviction is now enabled in this PR. |
Test FAILed. |
Test FAILed. |
@stephanie-wang Sounds great. Just a few questions on this:
|
Test FAILed. |
Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
…h_set::erase (ray-project#7633) Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
One limitation is that workloads that previously relied on LRU to function (e.g., workloads like IPython that keep references around forever even if they aren't used) will not work (and same goes for the previous two releases). However, they should be able to revert to the old LRU behavior by turning off the We have not seen substantial performance regressions, but will be keeping an eye out for this while running the release tests (cc @simon-mo). |
@stephanie-wang the diff is showing as +1314/-813 LOC, not sure why. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Please see the design doc for this.
On my laptop, the PR for eager eviction (#7220) seemed to incur ~10% overhead for workloads with many plasma objects. The goal is to have this in for the next ray release so we can run the performance benchmarks from the release testing. |
Test FAILed. |
…nto enable-ref-counting
Test FAILed. |
Test FAILed. |
Test FAILed. |
Why are these changes needed?
Turn on distributed ref counting by default. This can be turned off with:
This also turns on eager eviction for objects by default. This means that all copies of an object will be proactively evicted once the ID is no longer in scope on any workers. This can be turned off with:
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.