Make RayExecutor use the current placement group if one exists #3134

Yard1 · 2021-08-27T16:29:20Z

Checklist before submitting

Did you read the contributor guide?
Did you update the docs?
Did you write any tests to validate this change?
Did you update the CHANGELOG, if this change affects users?

Description

Adds a PGStrategy to RayExecutor, which will automatically capture the placement group should one be currently present, and use it for Horovod.

Review process to land

All tests and other checks must succeed.
At least one member of the technical steering committee must review and approve.
If any member of the technical steering committee requests changes, they must be addressed.

Yard1 · 2021-08-27T16:44:05Z

@richardliaw @tgaddair

github-actions · 2021-08-27T18:45:51Z

Unit Test Results

    766 files ±0     766 suites ±0 6h 7m 5s ⏱️ ±0s
    701 tests ±0     655 ✔️ ±0     46 💤 ±0 0 ❌ ±0
16 475 runs ±0 11 494 ✔️ ±0 4 981 💤 ±0 0 ❌ ±0

Results for commit adce8fa. ± Comparison against base commit adce8fa.

♻️ This comment has been updated with latest results.

richardliaw

I'd be nice to reduce duplication somehow.

richardliaw · 2021-08-27T19:10:13Z

horovod/ray/strategy.py

+        self.cpus_per_worker = cpus_per_worker
+        self.gpus_per_worker = gpus_per_worker or 1
+        self.use_gpu = use_gpu
+        self.placement_group = placement_group or get_current_placement_group()


is the main change that you can pass in a placement group? Is it possible to implement this as a parameter of the above colocated / pack strategy?

I feel like this strategy is logically different enough to warrant a separate class. I'll see if I can reuse more code

Hmm, isn't it the same as the pack strategy, except you just take in a placement group instead?

That's true, but PackStrategy implies that it will use PACK, while the placement group passed can use any strategy. Maybe I can inherit from PackStrategy instead?

Hmm, interesting. I guess basically you want a strategy that is agnostic to the placement group creation (and by default, submits to the given placement group)

I guess that yeah, it could be rolled into PackStrategy, though it probably could use a rename in that case.

ColocatedStrategy is a bit of a special case as it requires the bundles to be strictly spread out, so that may be left as is.

There is one argument for keeping the structure as-is, though - it makes the strategy setting clearer. This is how it looks like right now:

def _create_strategy(self): assert self.num_workers is None or self.num_hosts is None if self.use_current_placement_group: try: # Will try to get the current PG, otherwise # will raise RuntimeError strategy = PGStrategy( settings=self.settings, num_workers=self.num_workers if self.num_workers else self.num_hosts * self.num_workers_per_host, use_gpu=self.use_gpu, cpus_per_worker=self.cpus_per_worker, gpus_per_worker=self.gpus_per_worker ) logger.info( "Found an existing placement group, inheriting. " "You can disable this behavior by setting " "`use_current_placement_group=False`." ) return strategy except RuntimeError: pass if self.num_workers: return PackStrategy( settings=self.settings, num_workers=self.num_workers, use_gpu=self.use_gpu, cpus_per_worker=self.cpus_per_worker, gpus_per_worker=self.gpus_per_worker) else: return ColocatedStrategy( settings=self.settings, num_hosts=self.num_hosts, num_workers_per_host=self.num_workers_per_host, use_gpu=self.use_gpu, cpus_per_worker=self.cpus_per_worker, gpus_per_worker=self.gpus_per_worker)

I feel like this provides a clearer logic at a glance, without having to go into the strategies and check what happens inside them

I was thinking maybe we can do:

_create_strategy(self): if strategy == "Colocated": return ColocatedStrategy(**kwargs) else: PackStrategy(use_current_placement_group=self.use_current_placement_group, **kwargs)

horovod/ray/strategy.py

horovod/ray/runner.py

richardliaw

A couple nits; overall looks good. Ping me when tests pass.

Yard1 · 2021-08-31T13:27:53Z

@richardliaw fixed the test, can you rerun?

CHANGELOG.md

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

github-actions · 2021-09-03T02:48:38Z

Unit Test Results (with flaky tests)

    903 files ±0     903 suites ±0 6h 27m 50s ⏱️ ±0s
    701 tests ±0     655 ✔️ ±0     45 💤 ±0 1 ❌ ±0
19 321 runs ±0 13 148 ✔️ ±0 6 172 💤 ±0 1 ❌ ±0

For more details on these failures, see this check.

Results for commit adce8fa. ± Comparison against base commit adce8fa.

♻️ This comment has been updated with latest results.

…od#3134) Signed-off-by: weihanmines <weihan13@amd.com>

- Fixes issue when start_epoch != 0 Signed-off-by: Dinesh Ramasamy <89654805+iitmdinesh@users.noreply.github.com> Signed-off-by: weihanmines <weihan13@amd.com> fix torch op handles lazy release which may cause oom in elastic scenario (horovod#3110) * fix torch op handles lazy release which may cause oom in elastic scenario Signed-off-by: guoze.lin <guozelin@tencent.com> * Update mpi_ops.py Co-authored-by: guoze.lin <guozelin@tencent.com> Co-authored-by: Travis Addair <tgaddair@gmail.com> Signed-off-by: weihanmines <weihan13@amd.com> Added support for extraction of storage options from url. (horovod#3137) * Added support for extraction of storage options from url. Signed-off-by: Manjur Ansari <maansar@microsoft.com> * mock fsspec.utils Signed-off-by: Manjur Ansari <maansar@microsoft.com> * Added missing comma Co-authored-by: Travis Addair <tgaddair@gmail.com> Signed-off-by: weihanmines <weihan13@amd.com> Make RayExecutor use the current placement group if one exists (horovod#3134) Signed-off-by: weihanmines <weihan13@amd.com> Fix the mapping btw pyspark and numpy (horovod#3146) Signed-off-by: Haoyang Chen <haoyang@uber.com> Signed-off-by: weihanmines <weihan13@amd.com> Add tests for Keras callbacks: MetricAverageCallback, LearningRateScheduleCallback and LearningRateWarmupCallback (horovod#3102) There were no tests for MetricAverageCallback, LearningRateScheduleCallback and LearningRateWarmupCallback from hvd as noted in horovod#2659. This PR adds testing to verify the callback works. Signed-off-by: Moses Lee <14leeyuchieh@gmail.com> Co-authored-by: Moses Lee <molee@molee-ld4.linkedin.biz> Signed-off-by: weihanmines <weihan13@amd.com> Split gpu tests in head and non-head versions (horovod#3155) Signed-off-by: Enrico Minack <github@enrico.minack.dev> Signed-off-by: weihanmines <weihan13@amd.com> Allow caller to customize the Tensorboard callback (horovod#3153) * Keras Estimator: Allow user to pass in TensorBoard callback Signed-off-by: Rich Porter <rich.porter@uber.com> * Remove callback from other processes on the same machine Signed-off-by: Rich Porter <rich.porter@uber.com> * Allow other ranks to profile as well. Doesn't seem to conflict Signed-off-by: Rich Porter <rich.porter@uber.com> Signed-off-by: weihanmines <weihan13@amd.com> test_torch.py: add explicit join() for testing duplicated name errors (horovod#3159) For torch nightly >=10.0, we need to add an explict join() call to avoid hanging when testing duplicated name errors. Signed-off-by: Chongxiao Cao <chongxiaoc@uber.com> Signed-off-by: weihanmines <weihan13@amd.com> Disable TF2.6.0 XLA support on OSX (horovod#3133) Related to issue#3132 Signed-off-by: Chongxiao Cao <chongxiaoc@uber.com> Signed-off-by: weihanmines <weihan13@amd.com> Fix linking _pywrap_tensorflow_internal.so and re-enable XLA on macOS (horovod#3173) Signed-off-by: weihanmines <weihan13@amd.com> Spark/Lightning: fix the usage of checkpoint callback (horovod#3186) Signed-off-by: Chongxiao Cao <chongxiaoc@uber.com> Signed-off-by: weihanmines <weihan13@amd.com> Fix Cometlogger experiment key lost issue (horovod#3184) * test Signed-off-by: Peng Zhang <pengz@uber.com> * test Signed-off-by: Peng Zhang <pengz@uber.com> * fix_logger Signed-off-by: Peng Zhang <pengz@uber.com> * fix_logger Signed-off-by: Peng Zhang <pengz@uber.com> * recreate_loger Signed-off-by: Peng Zhang <pengz@uber.com> * fix_var Signed-off-by: Peng Zhang <pengz@uber.com> * test Signed-off-by: Peng Zhang <pengz@uber.com> * test Signed-off-by: Peng Zhang <pengz@uber.com> Signed-off-by: weihanmines <weihan13@amd.com> Updated torch c++ to use new aten api (horovod#3175) Signed-off-by: weihanmines <weihan13@amd.com> Spark/Keras: remove bare Keras support (horovod#3191) Signed-off-by: weihanmines <weihan13@amd.com> Make fork PRs publish test change stats (horovod#3185) Signed-off-by: Enrico Minack <github@enrico.minack.dev> Signed-off-by: weihanmines <weihan13@amd.com> Support for nccl on cuda 11.4 (horovod#3182) Signed-off-by: Evan Brossard <evanb@maka-ars.com> Signed-off-by: weihanmines <weihan13@amd.com> Fix MPICH support (horovod#3148) * fix MPICH implementation * enable tests for MPICH and Intel MPI Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu> Signed-off-by: weihanmines <weihan13@amd.com> Increase build timeout to 40m on Buildkite (horovod#3192) Signed-off-by: Enrico Minack <github@enrico.minack.dev> Signed-off-by: weihanmines <weihan13@amd.com> Change CMake syntax to be compatible with old versions of CMake (horovod#3196) Signed-off-by: Max H. Gerlach <git@maxgerlach.de> Signed-off-by: weihanmines <weihan13@amd.com> Reinit every torch test (horovod#3194) Signed-off-by: weihanmines <weihan13@amd.com> Add barrier call to torch module to support easy synchronization for process sets (horovod#3139) * Added barrier call to torch module Signed-off-by: TJ <tix@uber.com> Signed-off-by: weihanmines <weihan13@amd.com> Bump version to 0.23.0 (horovod#3200) Signed-off-by: Travis Addair <tgaddair@gmail.com> Co-authored-by: Max H. Gerlach <git@maxgerlach.de> Signed-off-by: weihanmines <weihan13@amd.com> Increase Parallel PyTest timeout to 10m (horovod#3198) * Increase MPI and Gloo Parallel PyTest timeout to 10m Signed-off-by: Enrico Minack <github@enrico.minack.dev> Signed-off-by: weihanmines <weihan13@amd.com> Spark/Lightning: don't overwrite model with checkpoint by default (horovod#3201) Lightning estimator saves model by default if there is no specified checkpoint callback. However, model is not overwritten with checkpoint file in that case. Signed-off-by: Chongxiao Cao <chongxiaoc@uber.com> Signed-off-by: weihanmines <weihan13@amd.com> Spark/Lightning: fix checkpoint callback dirpath typo (horovod#3204) Signed-off-by: Chongxiao Cao <chongxiaoc@uber.com> Signed-off-by: weihanmines <weihan13@amd.com> Rework events in CI workflows (horovod#3202) Signed-off-by: Enrico Minack <github@enrico.minack.dev> Signed-off-by: weihanmines <weihan13@amd.com> Allow for concurrent schedule and master build, document concurrency (horovod#3206) Signed-off-by: Enrico Minack <github@enrico.minack.dev> Signed-off-by: weihanmines <weihan13@amd.com> Ray: fix RayExecutor to fail when num_workers=0 and num_hosts=None (horovod#3210) Signed-off-by: Travis Addair <tgaddair@gmail.com> Signed-off-by: weihanmines <weihan13@amd.com> add_history_in_lightning_estimator (horovod#3214) Signed-off-by: Peng Zhang <pengz@uber.com> Signed-off-by: weihanmines <weihan13@amd.com> Allow buildkite building merge commits on forks (horovod#3215) Signed-off-by: Enrico Minack <github@enrico.minack.dev> Signed-off-by: weihanmines <weihan13@amd.com> Fix json output in ci-results.yaml (horovod#3217) Signed-off-by: weihanmines <weihan13@amd.com> Spark/Lightning: fix history metrics for estimator serialization (horovod#3216) Save metrics inside the checkpoint dict , which will be load with map_location=torch.device('cpu') Signed-off-by: Peng Zhang <pengz@uber.com> Signed-off-by: weihanmines <weihan13@amd.com> patch python source files on macCI (horovod#3220) * patch python source files on macCI * Trigger build and test CI Signed-off-by: TJ <tix@uber.com> Co-authored-by: Enrico Minack <github@enrico.minack.dev> Signed-off-by: weihanmines <weihan13@amd.com> Updated examples of torch and tf to include mixed precision training (horovod#3222) * Added mixed precision example for pytorch * added mixed precision for keras Signed-off-by: TJ <tix@uber.com> Signed-off-by: weihanmines <weihan13@amd.com> Job buildkite-heads accesses ci-workflow outputs, add it to the needs (horovod#3225) Signed-off-by: Enrico Minack <github@enrico.minack.dev> Signed-off-by: weihanmines <weihan13@amd.com> Fixes race condition for ray scale up down tests (horovod#3205) Ensure that at least one host from the previous set of hosts have been registered. Without this, the discovery script will "discover" the new set of hosts before the current set can register. This would result in a race condition. Consider a discovery schedule: ``` discovery_schedule = [ (10, ['host-1:2']), (30, ['host-1:2', 'host-2:1', 'host-3:1']), (None, ['host-2:1']), ] ``` The initial set is: ['host-1:2']. Before this is registered in the driver, the discovery script discovers the set: ['host-1:2', 'host-2:1', 'host-3:1'], and adds ['host-2:1', 'host-3:1']. However, since ['host-1:2'] has not registered, there is no coordinator to notify the workers. When host-1 and host-3 are removed, driver.resume will call _activate_workers, which will update the host assignments. It has a check to see if the intersection between the previous and current set of hosts. It finds that the previous set is ['host-1:2'], and the current set is ['host-2:1'], since there was no notification for the added and removed hosts. This ensures that the previous set of hosts can register before the current set is discovered. Signed-off-by: Abin Shahab <ashahab@linkedin.com> Signed-off-by: weihanmines <weihan13@amd.com> Removed a case of the default mutable argument pitfall (horovod#3227) Signed-off-by: Naelson Douglas <naelson17@gmail.com> Signed-off-by: weihanmines <weihan13@amd.com> Updates to TSC members (horovod#3234) Signed-off-by: Travis Addair <tgaddair@gmail.com> Signed-off-by: weihanmines <weihan13@amd.com> Add in-place broadcast for TensorFlow (horovod#3128) * Update comment in FindTensorflow.cmake Signed-off-by: Max H. Gerlach <git@maxgerlach.de> * Add in-place broadcast_() and broadcast_variables() for TF Signed-off-by: Max H. Gerlach <git@maxgerlach.de> * Include source files from TF in build to avoid missing symbol errors Signed-off-by: Max H. Gerlach <git@maxgerlach.de> * Limit build and test to TF 2.6+ Signed-off-by: Max H. Gerlach <git@maxgerlach.de> * Remove source files copied from TensorFlow The missing symbols are resolved by linking against _pywrap_tensorflow_internal.so, which was introduced to Horovod with PR horovod#3053. Signed-off-by: Max H. Gerlach <git@maxgerlach.de> * Fix possible type attribute values for HorovodBroadcastInplace Signed-off-by: Max H. Gerlach <git@maxgerlach.de> * Add reference variables to test Signed-off-by: Max H. Gerlach <git@maxgerlach.de> * Update comments, doc strings, changelog Signed-off-by: Max H. Gerlach <git@maxgerlach.de> Signed-off-by: weihanmines <weihan13@amd.com> [Elastic Horovod] Fix the bug for ElasticSampler and hvd.elastic.state (horovod#3144) Co-authored-by: gethinhu <gethinhu@tencent.com> Signed-off-by: weihanmines <weihan13@amd.com> a better way to handle nccl error under elastic scenario (horovod#3112) Signed-off-by: guoze.lin <guozelin@tencent.com> Signed-off-by: weihanmines <weihan13@amd.com> check torch version for mixed precision example (horovod#3238) Signed-off-by: weihanmines <weihan13@amd.com> Lightning: set limit_train_batches and limit_val_batches (horovod#3237) Tell Lightning trainer that how many batches a single epoch needs. Signed-off-by: Chongxiao Cao <chongxiaoc@uber.com> Signed-off-by: weihanmines <weihan13@amd.com> Spark/Lightning: reduce memory footprint of async dataloader (horovod#3239) Limit async data loader queue size. Signed-off-by: Peng Zhang <pengz@uber.com> Signed-off-by: weihanmines <weihan13@amd.com> Change default fusion threshold from 64MB to 128MB in docs (horovod#3241) Signed-off-by: weihanmines <weihan13@amd.com> fix the example of pytorch_lightning_mnist.py (horovod#3245) - remove unused arg parameters - fix model test issue on GPU Signed-off-by: Chongxiao Cao <chongxiaoc@uber.com> Signed-off-by: weihanmines <weihan13@amd.com> CI: use latest pytorch_lightning with torchhead (horovod#3243) Signed-off-by: weihanmines <weihan13@amd.com> test_gradient_aggregation with real gradient instead of a constant (horovod#3176) This fixes issue horovod#2664 by performing gradient aggregation with a real gradient instead of a constant. PR: horovod#2647 shifts the gradient allreduce when the gradient is computed (both through the DistributedOptimizer or through the DistributedGradientTape). Which means that this unittest, by design in TF2.4, doesn't call allreduce in _aggregate_gradients(). Since this unittest provide a gradient as constant (without effectively computing it), the gradient will never be allreduced. The current change ensure that instead of a constant a real gradient is computed from a loss-function. Note: The current loss-function intentionally evaluates to zero. A future PR should convert it to a real loss function(e.g. MeanSquaredError) and compute gradients from that to test gradient aggregation. Signed-off-by: Abin Shahab <ashahab@linkedin.com> Signed-off-by: weihanmines <weihan13@amd.com> Remove MetricAverageCallback warning on tf >= 2.5 (horovod#3050) Signed-off-by: Henrique Mendonça <henrique.mendonca@cscs.ch> Signed-off-by: weihanmines <weihan13@amd.com> Fix Horovod pyarrow IndexError: list index out of range (horovod#3255) Signed-off-by: Weichen Xu <weichen.xu@databricks.com> Signed-off-by: weihanmines <weihan13@amd.com> Fixing up current CI test failures. (horovod#3259) Signed-off-by: Josh Romero <joshr@nvidia.com> Co-authored-by: Travis Addair <tgaddair@gmail.com> Co-authored-by: Enrico Minack <github@enrico.minack.dev> Signed-off-by: weihanmines <weihan13@amd.com> Revert "Fix Horovod pyarrow IndexError: list index out of range (horovod#3255)" (horovod#3265) This reverts commit 3efc229. Signed-off-by: Travis Addair <tgaddair@gmail.com> Signed-off-by: weihanmines <weihan13@amd.com> Debugging for lightning data loader and fix for simple profiler. (horovod#3253) add debugging flag for lightning data loader , make async data loader queue size configurable Signed-off-by: weihanmines <weihan13@amd.com> Call process_set._setup in init() to point to the correct native lib path (horovod#3258) * call setup for common process_set in remote trainers moved _setup call to init() Signed-off-by: TJ <tix@uber.com> Signed-off-by: weihanmines <weihan13@amd.com> Add support for MXNet async dependency engine. (horovod#3242) Signed-off-by: Josh Romero <joshr@nvidia.com> Signed-off-by: weihanmines <weihan13@amd.com>

Yard1 force-pushed the horovod_ray_inherit_pg branch 2 times, most recently from 5068c21 to f3e8299 Compare August 27, 2021 16:43

Yard1 marked this pull request as ready for review August 27, 2021 16:43

Yard1 changed the title ~~Add PGStrategy to Horovod-Ray~~ Make RayExecutor use the current placement group if one exists Aug 27, 2021

richardliaw requested changes Aug 27, 2021

View reviewed changes

richardliaw reviewed Aug 30, 2021

View reviewed changes

horovod/ray/strategy.py Outdated Show resolved Hide resolved

richardliaw reviewed Aug 30, 2021

View reviewed changes

horovod/ray/runner.py Outdated Show resolved Hide resolved

richardliaw reviewed Aug 30, 2021

View reviewed changes

Yard1 force-pushed the horovod_ray_inherit_pg branch 2 times, most recently from 3f7603b to ea99366 Compare August 31, 2021 13:27

Yard1 force-pushed the horovod_ray_inherit_pg branch from 75009db to 6d9ab24 Compare August 31, 2021 20:41

richardliaw approved these changes Aug 31, 2021

View reviewed changes

richardliaw reviewed Aug 31, 2021

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Yard1 added 14 commits September 1, 2021 21:53

Add PGStrategy to Horovod-Ray

52c47cf

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Add test

4c0b2d2

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Add arg to capture or not

0f8d49c

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Update changelog

379064e

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Fix

32f4290

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Nit

758b706

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Remove debug

2abdfb2

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Nit

de5ac16

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Fix test

3a0a4b3

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Reuse code

6c626f4

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Nit

1e2ab35

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Improve strategy selection

8af7343

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Combine into PackStrategy

ce931c2

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Nit

41816e7

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Yard1 added 6 commits September 1, 2021 21:53

Fix test

4b12420

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Improve test

528baf4

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Fix docs

573d39f

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Update changelog

d2e8042

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Fix num of workers

49f659d

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Ensure num_workers is an int

be3e4aa

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Yard1 force-pushed the horovod_ray_inherit_pg branch from c20d861 to be3e4aa Compare September 1, 2021 21:53

Yard1 marked this pull request as draft September 1, 2021 22:12

Yard1 marked this pull request as ready for review September 1, 2021 22:30

Nits

ca13e19

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Yard1 force-pushed the horovod_ray_inherit_pg branch from 0be5c61 to ca13e19 Compare September 1, 2021 22:32

Yard1 mentioned this pull request Sep 2, 2021

Add Ray backend to Ray hyperopt ludwig-ai/ludwig#1269

Merged

tgaddair merged commit adce8fa into horovod:master Sep 2, 2021

Yard1 deleted the horovod_ray_inherit_pg branch September 2, 2021 23:59

weihanmines pushed a commit to weihanmines/horovod that referenced this pull request Dec 11, 2021

Make RayExecutor use the current placement group if one exists (horov…

dddaff8

…od#3134) Signed-off-by: weihanmines <weihan13@amd.com>

amogkam mentioned this pull request Jan 6, 2022

Horovod updates and cleanup ray-project/ray_lightning#71

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make RayExecutor use the current placement group if one exists #3134

Make RayExecutor use the current placement group if one exists #3134

Yard1 commented Aug 27, 2021 •

edited

Loading

Yard1 commented Aug 27, 2021

github-actions bot commented Aug 27, 2021 •

edited

Loading

richardliaw left a comment

richardliaw Aug 27, 2021

Yard1 Aug 27, 2021

richardliaw Aug 27, 2021

Yard1 Aug 27, 2021

richardliaw Aug 27, 2021

Yard1 Aug 27, 2021

Yard1 Aug 27, 2021

richardliaw Aug 28, 2021

Yard1 Aug 30, 2021

richardliaw left a comment

Yard1 commented Aug 31, 2021

github-actions bot commented Sep 3, 2021 •

edited

Loading

Make RayExecutor use the current placement group if one exists #3134

Make RayExecutor use the current placement group if one exists #3134

Conversation

Yard1 commented Aug 27, 2021 • edited Loading

Checklist before submitting

Description

Review process to land

Yard1 commented Aug 27, 2021

github-actions bot commented Aug 27, 2021 • edited Loading

Unit Test Results

richardliaw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richardliaw left a comment

Choose a reason for hiding this comment

Yard1 commented Aug 31, 2021

github-actions bot commented Sep 3, 2021 • edited Loading

Unit Test Results (with flaky tests)

Yard1 commented Aug 27, 2021 •

edited

Loading

github-actions bot commented Aug 27, 2021 •

edited

Loading

github-actions bot commented Sep 3, 2021 •

edited

Loading