Adding register_trainable logic to RayTuneExecutor #1117

ANarayan · 2021-03-12T07:31:40Z

This PR registers the trainable function passed to tune.run using the ray.tune util function register_trainable. This fix is necessary in order ensure that the trainable function is accessible to all Ray processes in a cluster. Moreover, when running several parallel ludwig.hyperopt calls on a given ray cluster, it is necessary to have a unique trainable function for each experiment. Doing so prevents the underlying objects of different experiments (i.e. self.decode_ctx) from being shared. A unique trainable function name is created by generating a hash of the experiment config.

cc: @tgaddair

tgaddair

Nice! LGTM.

ANarayan added 2 commits March 11, 2021 23:11

adding register_trainable logic to RayTuneExecutor

c02e0db

fixing minor error

1853d63

tgaddair approved these changes Mar 12, 2021

View reviewed changes

tgaddair merged commit 1aee251 into ludwig-ai:master Mar 12, 2021

ANarayan deleted the ray-tune-fix branch April 3, 2021 20:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding register_trainable logic to RayTuneExecutor #1117

Adding register_trainable logic to RayTuneExecutor #1117

ANarayan commented Mar 12, 2021 •

edited

Loading

tgaddair left a comment

Adding register_trainable logic to RayTuneExecutor #1117

Adding register_trainable logic to RayTuneExecutor #1117

Conversation

ANarayan commented Mar 12, 2021 • edited Loading

tgaddair left a comment

Choose a reason for hiding this comment

ANarayan commented Mar 12, 2021 •

edited

Loading