Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Object Spilling] 100GB shuffle release test #13729

Merged
merged 9 commits into from
Jan 29, 2021

Conversation

rkooo567
Copy link
Contributor

Why are these changes needed?

Add a single / 4 nodes streaming shuffle stress test. I made output pretty too lol.

Screen Shot 2021-01-26 at 11 21 26 PM

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Contributor

@wuisawesome wuisawesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add this to the release process, and include what the success condition is, and any information that needs tp be recorded?

release/data_processing_tests/README.md Outdated Show resolved Hide resolved
rows_per_partition = partition_size // (8 * 2)
object_store_size = 20 * 1024 * 1024 * 1024 # 20G

system_config = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't these be set automatically once object spilling is turned on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But we'd like to run this in the current release, and object spilling will be turned off for the current release. I can create other PR to fix it later.

@rkooo567
Copy link
Contributor Author

Sounds good. Will update them tomorrow. If you'd like to run it asap, the success criteria now is just that it is finished. (I will write more details soon).

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 27, 2021
@rkooo567 rkooo567 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 29, 2021
# Command to start ray on the head node. You don't need to change this.
head_start_ray_commands:
- ray stop
# - ulimit -n 65536; ray start --head --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml --system-config='{"automatic_object_spilling_enabled":true,"max_io_workers":1,"object_spilling_config":"{\"type\":\"filesystem\",\"params\":{\"directory_path\":\"/tmp/spill\"}}"}'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need it because we will start a ray instance in the driver. If you'd like to remove this, lmk (I prefer to keep it as a reference).

- ``data_processing_tests/workloads/streaming_shuffle.py`` run the 100GB streaming shuffle in a single node & fake 4 nodes cluster.

**IMPORTANT** Check if the workload scripts has terminated. If so, please record the result (both read/write bandwidth and the shuffle result) to the ``release_logs/data_processing_tests/[test_name]``.
Both shuffling runtime and read/write bandwidth shouldn't be decreasing more than 15% compared to the previous release.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15% might be a bit ambitious, but let's see how it looks from the next release.

@ericl ericl merged commit c21a79a into ray-project:master Jan 29, 2021
fishbone pushed a commit to fishbone/ray that referenced this pull request Feb 16, 2021
fishbone added a commit to fishbone/ray that referenced this pull request Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants