Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Introduce TrialRunner Abstraction #720

Draft
wants to merge 106 commits into
base: main
Choose a base branch
from

Conversation

bpkroth
Copy link
Contributor

@bpkroth bpkroth commented Mar 19, 2024

This is another step in adding support for parallel trial execution #380.

Here we separate out the running of an individual trial to a single class - TrialRunner.

Multiple TrialRunners are instantiated at CLI invocation with the --num-trial-runners argument. 
Each TrialRunner associated with a single copy of the root Environment, and made unique by means of a unique trial_runner_id value that's included in that Environment's global_config.

TODO:

  • tests

In future PRs we will add:

  • New Scheduler implementations to run TrialRunners in parallel.
  • Async polling of status results in each TrialRunner independently.

@bpkroth bpkroth added the WIP Work in progress - do not merge yet label Mar 19, 2024
@@ -123,6 +137,7 @@ def get_results_df(
'tunable_config_id',
'tunable_config_trial_group_id',
'status',
'trial_runner_id',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs tests

"""Get the running state of the current TrialRunner."""
return self._is_running

def run_trial(self,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs tests

self.root_env_config, TunableGroups(), env_global_config, service=self._parent_service)
self.trial_runners[trial_runner_id] = TrialRunner(trial_runner_id, env)
_LOG.info("Init %d trial runners for environments: %s",
self.trial_runners, list(trial_runner.environment for trial_runner in self.trial_runners))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs tests.

})
# Rotate which TrialRunner the Trial is assigned to.
self._current_trial_runner_idx = (self._current_trial_runner_idx + 1) % len(self._trial_runners)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs tests.

parser.add_argument(
'--trial_runners', '--trial-runners', required=False, type=int, default=1,
help='Number of trial runners to run in parallel. '
+ 'Individual TrialRunners can be identified in configs with $trial_runner_id.')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs tests.

TrialRunner
"""
if trial.trial_runner_id is None:
raise ValueError(f"Trial {trial} has no trial_runner_id")
Copy link
Contributor Author

@bpkroth bpkroth Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add fallback support for assigning a Trial to a TrialRunner if one is missing.

Comment on lines 57 to +58
# pylint: disable=too-many-statements
# pylint: disable=too-complex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# pylint: disable=too-many-statements
# pylint: disable=too-complex
# pylint: disable=too-many-statements,too-complex


# NOTE: Init tunable values *after* the Environment, but *before* the Optimizer
self.trial_runners: List[TrialRunner] = []
for trial_runner_id in range(0, self.global_config["num_trial_runners"]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for trial_runner_id in range(0, self.global_config["num_trial_runners"]):
for trial_runner_id in range(self.global_config["num_trial_runners"]):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Work in progress - do not merge yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants