[serve] Remove BackendConfig broadcasting #19154

edoakes · 2021-10-06T18:54:23Z

Why are these changes needed?

Depends on: #19145

Instead of broadcasting max_concurrent_queries at the deployment level, tacks it on as additional metadata for each individual replica.

Related issue number

Closes #19147

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

edoakes · 2021-10-06T18:54:37Z

@simon-mo review #19145 first

…ve-backend-config-long-poll-from-replica

simon-mo · 2021-10-06T20:24:12Z

Yup. Totally agree this is the right track. For the Java backend prototype, I ended up adding a similar structure as ReplicaInfo

…ve-backend-config-long-poll-from-replica

…-poll-max-concurrent-queries

…ve-backend-config-long-poll-from-replica

…-poll-max-concurrent-queries

edoakes · 2021-10-08T16:09:31Z

python/ray/serve/version.py

+
+# Set the hash seed to ensure the hashes are consistent across processes.
+# We should probably use a purpose-built hashing algorithm like CRC32 instead.
+os.environ["PYTHONHASHSEED"] = "0"


cc @jiaodong discovered this while working on this PR -- the user_config hashes are different across processes, so the recovery logic is likely not working for these...

wow this was quite subtle but great catch. Let's follow up more in slack as it seems we need more thorough testing and harding around FT stuff.

let's categorize this as a tech debt and add an issue to the tech debt milestone? setting PYTHONHASHSEED is generally bad as this might also set it for whatever user code depends on

Hash randomization is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict construction, O(n2) complexity. See http://www.ocert.org/advisories/ocert-2011-003.html for details.

@simon-mo take a look at the updated pickle.dumps version?

…-poll-max-concurrent-queries

edoakes · 2021-10-12T23:43:30Z

@simon-mo this is ready for review now

simon-mo

LGTM. Two minor comments about replica.py code changes.

python/ray/serve/replica.py

edoakes added 3 commits October 6, 2021 12:46

clean up

36efdc2

fix java

e9edacf

fix java hopefully

0f419d6

edoakes assigned simon-mo Oct 6, 2021

fix java lint

46e6e0c

edoakes force-pushed the dont-poll-max-concurrent-queries branch from b37a0cd to 6bd3f6a Compare October 6, 2021 18:55

edoakes added 4 commits October 6, 2021 14:17

Merge branch 'master' of https://github.com/ray-project/ray into remo…

28f6579

…ve-backend-config-long-poll-from-replica

wip

f7d746c

working kinda

d4cd6ab

passes basic test

9743854

edoakes force-pushed the dont-poll-max-concurrent-queries branch from 6bd3f6a to 9743854 Compare October 6, 2021 19:18

edoakes added 17 commits October 6, 2021 16:39

Merge branch 'master' of https://github.com/ray-project/ray into remo…

8bd564b

…ve-backend-config-long-poll-from-replica

fix unit test

2a3b872

add to decorator

b7b80f0

Merge branch 'remove-backend-config-long-poll-from-replica' into dont…

55a59ac

…-poll-max-concurrent-queries

fix unit test

c2fd8e0

fix java

0779356

Merge branch 'remove-backend-config-long-poll-from-replica' into dont…

a5f4880

…-poll-max-concurrent-queries

fix test

9ab0409

force -> gracefulg

ba79ff5

Merge branch 'remove-backend-config-long-poll-from-replica' into dont…

ecf1c35

…-poll-max-concurrent-queries

fix autoscaling test

e84d1ed

Merge branch 'remove-backend-config-long-poll-from-replica' into dont…

5c6fa81

…-poll-max-concurrent-queries

fixes

d43f7d4

fix

ec6a398

Merge branch 'master' of https://github.com/ray-project/ray into remo…

24e12f3

…ve-backend-config-long-poll-from-replica

Merge branch 'master' of https://github.com/ray-project/ray into remo…

90a6bb0

…ve-backend-config-long-poll-from-replica

bump timeout

862829f

edoakes added 4 commits October 8, 2021 10:14

add prints

39406db

Merge branch 'master' of https://github.com/ray-project/ray into dont…

e71b831

…-poll-max-concurrent-queries

fixes

0c1fa29

Merge branch 'remove-backend-config-long-poll-from-replica' into dont…

1379945

…-poll-max-concurrent-queries

edoakes commented Oct 8, 2021

View reviewed changes

Merge branch 'master' of https://github.com/ray-project/ray into dont…

3570f58

…-poll-max-concurrent-queries

edoakes force-pushed the dont-poll-max-concurrent-queries branch from dfe0b94 to 3570f58 Compare October 8, 2021 18:07

edoakes added 4 commits October 8, 2021 15:06

Merge branch 'master' of https://github.com/ray-project/ray into dont…

b14cfab

…-poll-max-concurrent-queries

fix

1e88afa

Merge branch 'master' of https://github.com/ray-project/ray into dont…

4e89961

…-poll-max-concurrent-queries

Merge branch 'master' of https://github.com/ray-project/ray into dont…

7bfb152

…-poll-max-concurrent-queries

edoakes requested a review from architkulkarni October 12, 2021 17:06

edoakes added 3 commits October 12, 2021 10:14

Merge branch 'master' of https://github.com/ray-project/ray into dont…

b7e49ab

…-poll-max-concurrent-queries

use pickle instead

6b3a16e

fix exception type

901c9bc

fix lint

5c43221

simon-mo approved these changes Oct 13, 2021

View reviewed changes

python/ray/serve/replica.py Outdated Show resolved Hide resolved

python/ray/serve/replica.py Outdated Show resolved Hide resolved

fixes

a81e8c2

edoakes merged commit 2ac81f3 into ray-project:master Oct 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[serve] Remove BackendConfig broadcasting #19154

[serve] Remove BackendConfig broadcasting #19154

edoakes commented Oct 6, 2021

edoakes commented Oct 6, 2021

simon-mo commented Oct 6, 2021

edoakes Oct 8, 2021

jiaodong Oct 8, 2021

simon-mo Oct 8, 2021

edoakes Oct 12, 2021

edoakes commented Oct 12, 2021

simon-mo left a comment

[serve] Remove BackendConfig broadcasting #19154

[serve] Remove BackendConfig broadcasting #19154

Conversation

edoakes commented Oct 6, 2021

Why are these changes needed?

Related issue number

Checks

edoakes commented Oct 6, 2021

simon-mo commented Oct 6, 2021

edoakes Oct 8, 2021

Choose a reason for hiding this comment

jiaodong Oct 8, 2021

Choose a reason for hiding this comment

simon-mo Oct 8, 2021

Choose a reason for hiding this comment

edoakes Oct 12, 2021

Choose a reason for hiding this comment

edoakes commented Oct 12, 2021

simon-mo left a comment

Choose a reason for hiding this comment