-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Option to fallback to LRU on OutOfMemory #7410
[core] Option to fallback to LRU on OutOfMemory #7410
Conversation
This reverts commit 44aded5.
f899f3d
to
97c1220
Compare
This reverts commit b6359fe.
97c1220
to
3ed8efc
Compare
Can one of the admins verify this patch? |
Test FAILed. |
Test FAILed. |
python/ray/exceptions.py
Outdated
"available with ray.init(object_store_memory=<bytes>). " | ||
"You can also try setting an option to fallback to LRU eviction " | ||
"when the object store is full by calling ray.init(" | ||
"_internal_config=json.dumps({\"object_pinning_enabled\": 0})). " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't ship with such an ugly message, how about ray.init(lru_evict=True)
or similar.
store_client_.Create(object_id.ToPlasmaId(), evict_if_full, data_size, | ||
metadata ? metadata->Data() : nullptr, | ||
metadata ? metadata->Size() : 0, &arrow_buffer); | ||
// Always try to evict after the first attempt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little bit scary because it seems plausible to me that the global GC broadcast + python GC + unpin message + unpinning from plasma could take over 1s on a busy cluster. Maybe we should only evict in the last attempt of the exponential backoff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think evicting after a short timeout is the right decision here since it ensures reasonable performance under moderate memory pressure. If the cluster is under such high memory pressure that the timing matters, I don't think the workload will be stable under any settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point but 1s still feels low enough to cause eviction to happen without "real" memory pressure that could.be solved by a gc.collect() due to CPU/network contention delaying messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can resolve that by periodically running gc collect, perhaps every 5 minutes. That would avoid ever hitting 100% memory usage under normal circumstances.
Perhaps we can also choose a middle value like 5 seconds before eviction starts. 30 seconds is much too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed 30s is definitely too long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could have a higher timeout for the first time we fall back to LRU (e.g., 5s) and then reduce after the first time LRU is necessary. Seems like a good compromise to avoid the condition I mentioned but still have reasonable perf for workloads with high memory pressure/that work with LRU (e.g., ipython)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'm okay with extending the LRU to a longer time, but it'd be good to choose the exact number and behavior afterwards based on some real measurements; otherwise it's just guesswork and it will complicate the code unnecessarily.
As a start, how about we just increase object_store_full_initial_delay_ms
to at least 5s in the startup config if object pinning is disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would effectively mean that every put would take 5s once the object store fills in the ipython use case, right? If that's the case then I'd say let's just leave it at 1s for now.
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test PASSed. |
Test FAILed. |
Test FAILed. |
This reverts commit 98f01c6.
Test PASSed. |
Test FAILed. |
Test FAILed. |
Why are these changes needed?
Object pinning will evict objects based on the current ref count. If object pinning is off, each plasma store evicts objects independently in LRU order. This PR modifies the behavior when
object_pinning_enabled
is off so that we trigger GC at all worker nodes before we fallback to LRU so that we favor evicting unreachable objects over objects still in use.TODO: Upgrade to plasma once this PR is merged.
Checks
scripts/format.sh
to lint the changes in this PR.