ray-0.7.3
Ray 0.7.3 Release Note
Highlights
- RLlib ModelV2API is ready to use. It improves support for Keras and RNN models, as well as allowing object-oriented reuse of variables. ModelV1 API is deprecated. No migration is needed.
ray.experimental.sgd.pytorch.PyTorchTrainer
is ready for early adopters. Checkout the documentation here. We welcome your feedback!model_creator = lambda config: YourPyTorchModel() data_creator = lambda config: YourTrainingSet(), YourValidationSet() trainer = PyTorchTrainer( model_creator, data_creator, optimizer_creator=utils.sgd_mse_optimizer, config={"lr": 1e-4}, num_replicas=2, resources_per_replica=Resources(num_gpus=1), batch_size=16, backend="auto") for i in range(NUM_EPOCHS): trainer.train()
- You can query all the clients that have performed
ray.init
to connect to the current cluster withray.jobs()
. #5076>>> ray.jobs() [{'JobID': '02000000', 'NodeManagerAddress': '10.99.88.77', 'DriverPid': 74949, 'StartTime': 1564168784, 'StopTime': 1564168798}, {'JobID': '01000000', 'NodeManagerAddress': '10.99.88.77', 'DriverPid': 74871, 'StartTime': 1564168742}]
Core
RLlib
- Finished port of all major RLlib algorithms to builder pattern #5277, #5258, #5249
learner_queue_timeout
can be configured for async sample optimizer. #5270reproducible_seed
can be used for reproducible experiments. #5197- Added entropy coefficient decay to IMPALA, APPO and PPO #5043
Tune:
- Breaking:
ExperimentAnalysis
is now returned by default fromtune.run
. To obtain a list of trials, useanalysis.trials
. #5115 - Breaking: Syncing behavior between head and workers can now be customized (
sync_to_driver
). Syncing behavior (upload_dir
) between cluster and cloud is now separately customizable (sync_to_cloud
). This changes the structure of the uploaded directory - nowlocal_dir
is synced withupload_dir
. #4450 - Introduce
Analysis
andExperimentAnalysis
objects.Analysis
object will now return all trials in a folder;ExperimentAnalysis
is a subclass that returns all trials of an experiment. #5115 - Add missing argument
tune.run(keep_checkpoints_num=...)
. Enables only keeping the last N checkpoints. #5117 - Trials on failed nodes will be prioritized in processing. #5053
- Trial Checkpointing is now more flexible. #4728
- Add system performance tracking for gpu, ram, vram, cpu usage statistics - toggle with
tune.run(log_sys_usage=True)
. #4924 - Experiment checkpointing frequency is now less frequent and can be controlled with
tune.run(global_checkpoint_period=...)
. #4859
Autoscaler
-
Add a
request_cores
function for manual autoscaling. You can now manually request resources for the autoscaler. #4754 -
Local cluster:
-
Improved logging with AWS NodeProvider.
create_instance
call will be logged. #4998
Others Libraries:
- SGD:
- Kuberentes: Ray namespace added for k8s. #4111
- Dev experience: Add linting pre-push hook. #5154
Thanks:
We thank the following contributors for their amazing contributions:
@joneswong, @1beb, @richardliaw, @pcmoritz, @raulchen, @stephanie-wang, @jiangzihao2009, @LorenzoCevolani, @kfstorm, @pschafhalter, @micafan, @simon-mo, @vipulharsh, @haje01, @ls-daniel, @hartikainen, @stefanpantic, @edoakes, @llan-ml, @alex-petrenko, @ztangent, @gravitywp, @MQQ, @Dulex123, @morgangiraud, @antoine-galataud, @robertnishihara, @qxcv, @vakker, @jovany-wang, @zhijunfu, @ericl