-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[serve] Remove BackendConfig broadcasting #19154
[serve] Remove BackendConfig broadcasting #19154
Conversation
b37a0cd
to
6bd3f6a
Compare
…ve-backend-config-long-poll-from-replica
6bd3f6a
to
9743854
Compare
Yup. Totally agree this is the right track. For the Java backend prototype, I ended up adding a similar structure as ReplicaInfo |
…ve-backend-config-long-poll-from-replica
…-poll-max-concurrent-queries
…-poll-max-concurrent-queries
…-poll-max-concurrent-queries
…-poll-max-concurrent-queries
…ve-backend-config-long-poll-from-replica
…ve-backend-config-long-poll-from-replica
…-poll-max-concurrent-queries
…-poll-max-concurrent-queries
python/ray/serve/version.py
Outdated
|
||
# Set the hash seed to ensure the hashes are consistent across processes. | ||
# We should probably use a purpose-built hashing algorithm like CRC32 instead. | ||
os.environ["PYTHONHASHSEED"] = "0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @jiaodong discovered this while working on this PR -- the user_config hashes are different across processes, so the recovery logic is likely not working for these...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow this was quite subtle but great catch. Let's follow up more in slack as it seems we need more thorough testing and harding around FT stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's categorize this as a tech debt and add an issue to the tech debt milestone? setting PYTHONHASHSEED is generally bad as this might also set it for whatever user code depends on
Hash randomization is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict construction, O(n2) complexity. See http://www.ocert.org/advisories/ocert-2011-003.html for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simon-mo take a look at the updated pickle.dumps
version?
…-poll-max-concurrent-queries
dfe0b94
to
3570f58
Compare
…-poll-max-concurrent-queries
…-poll-max-concurrent-queries
…-poll-max-concurrent-queries
…-poll-max-concurrent-queries
@simon-mo this is ready for review now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Two minor comments about replica.py code changes.
Why are these changes needed?
Depends on: #19145
Instead of broadcasting max_concurrent_queries at the deployment level, tacks it on as additional metadata for each individual replica.
Related issue number
Closes #19147
Checks
scripts/format.sh
to lint the changes in this PR.