WIP: Introduce TrialRunner Abstraction #720

bpkroth · 2024-03-19T21:24:40Z

This is another step in adding support for parallel trial execution #380.

Here we separate out the running of an individual trial to a single class - TrialRunner.

Multiple TrialRunners are instantiated at CLI invocation with the --num-trial-runners argument.
Each TrialRunner associated with a single copy of the root Environment, and made unique by means of a unique trial_runner_id value that's included in that Environment's global_config.

TODO:

tests

In future PRs we will add:

New Scheduler implementations to run TrialRunners in parallel.
Async polling of status results in each TrialRunner independently.

…nit test for bench (not tested)

…e bulk registration (check for is_warm_up)

…ion loop

…eduler

Co-authored-by: Brian Kroth <bpkroth@users.noreply.github.com>

… parallel-async-trial-runners

…m in parallel eventually

bpkroth · 2024-03-19T22:10:58Z

mlos_bench/mlos_bench/storage/sql/common.py

@@ -123,6 +137,7 @@ def get_results_df(
                'tunable_config_id',
                'tunable_config_trial_group_id',
                'status',
+                'trial_runner_id',


Needs tests

bpkroth · 2024-03-19T22:11:59Z

mlos_bench/mlos_bench/schedulers/trial_runner.py

+        """Get the running state of the current TrialRunner."""
+        return self._is_running
+
+    def run_trial(self,


Needs tests

bpkroth · 2024-03-19T22:12:15Z

mlos_bench/mlos_bench/launcher.py

+                self.root_env_config, TunableGroups(), env_global_config, service=self._parent_service)
+            self.trial_runners[trial_runner_id] = TrialRunner(trial_runner_id, env)
+        _LOG.info("Init %d trial runners for environments: %s",
+                  self.trial_runners, list(trial_runner.environment for trial_runner in self.trial_runners))


Needs tests.

bpkroth · 2024-03-19T22:12:53Z

mlos_bench/mlos_bench/schedulers/base_scheduler.py

            })
+            # Rotate which TrialRunner the Trial is assigned to.
+            self._current_trial_runner_idx = (self._current_trial_runner_idx + 1) % len(self._trial_runners)


Needs tests.

bpkroth · 2024-03-19T22:14:41Z

mlos_bench/mlos_bench/launcher.py

+        parser.add_argument(
+            '--trial_runners', '--trial-runners', required=False, type=int, default=1,
+            help='Number of trial runners to run in parallel. '
+            + 'Individual TrialRunners can be identified in configs with $trial_runner_id.')


Needs tests.

bpkroth · 2024-03-19T22:15:33Z

mlos_bench/mlos_bench/schedulers/base_scheduler.py

+        TrialRunner
+        """
+        if trial.trial_runner_id is None:
+            raise ValueError(f"Trial {trial} has no trial_runner_id")


Add fallback support for assigning a Trial to a TrialRunner if one is missing.

motus · 2024-03-20T01:43:54Z

mlos_bench/mlos_bench/launcher.py

        # pylint: disable=too-many-statements
+        # pylint: disable=too-complex


Suggested change

# pylint: disable=too-many-statements

# pylint: disable=too-complex

# pylint: disable=too-many-statements,too-complex

motus · 2024-03-20T20:28:39Z

mlos_bench/mlos_bench/launcher.py

-
-        # NOTE: Init tunable values *after* the Environment, but *before* the Optimizer
+        self.trial_runners: List[TrialRunner] = []
+        for trial_runner_id in range(0, self.global_config["num_trial_runners"]):


Suggested change

for trial_runner_id in range(0, self.global_config["num_trial_runners"]):

for trial_runner_id in range(self.global_config["num_trial_runners"]):

motus added 30 commits February 21, 2024 15:34

do not pass the optimizer into _run()

616f44e

mypy fixes

33e332a

start splitting the optimization loop into two

0247259

first complete version of the optimization loop (not tested yet)

483e378

Merge branch 'main' into sergiym/run/2loops

addd5a4

allow running mlos_bench.run._main directly from unit tests + add a u…

e97266f

…nit test for bench (not tested)

move in-process launch to a separate unit test file

64771fd

add is_warm_up flag to the optimization step

bd7c55e

Merge branch 'main' of github.com:microsoft/MLOS into sergiym/run/2loops

387722a

in-process optimizaiton loop invocation works!

9f15aee

add multi-iteration optimization to in-process test; fix the mlos_cor…

65cd072

…e bulk registration (check for is_warm_up)

make in-process launcerh tests pass

c010d95

remove unnecessary local variables to make pylint happy

7cfef3a

move trial_config_repeat_count checks to the launcher

7233180

make experiment.load() return trial_ids and use them in the optimizat…

be7dcec

…ion loop

use proper last_trial_id in the main loop; fix the unit tests

3c52e03

update launcher tests with the new output patterns

0d9dc97

remove unused variable

4e171e0

Merge branch 'main' into sergiym/run/2loops

ab69fa0

better naming for functions in the optimization loop

52adab8

start implementing the scheduler class

df893d9

change the default value for is_warm_up parameter to False

5aca764

Merge branch 'sergiym/run/2loops' into sergiym/run/scheduler

4d183df

started to implement teh start() method of the sync scheduler

309e10c

Merge branch 'main' of github.com:microsoft/MLOS into sergiym/run/sch…

9a72b40

…eduler

implement proper Scheduler constructor

ffe23e1

more clean-ups to the base scheduler

cb863e0

minor pylint fixes

990b019

add _add_trial_to_queue() method

2ac0520

better handling of warm-up phase (no redundant code)

b95100a

bpkroth and others added 10 commits March 15, 2024 20:04

fixup relative paths

2ca34cd

basic schema testing

946b0c4

Merge branch 'main' into sergiym/run/scheduler_load

58e8609

add another test case

7985a3e

Update mlos_bench/mlos_bench/launcher.py

8070c30

Co-authored-by: Brian Kroth <bpkroth@users.noreply.github.com>

pylint

f395531

Merge remote-tracking branch 'sergiy/sergiym/run/scheduler_load' into…

678f4c5

… parallel-async-trial-runners

remove async status changes for now - future PR

969e496

wip

6f4928f

wip: refactor running of a trial to a separate class so we can do the…

92382dc

…m in parallel eventually

bpkroth added the WIP Work in progress - do not merge yet label Mar 19, 2024

bpkroth added 2 commits March 19, 2024 21:57

Merge branch 'main' into trial-runner-abstraction

f00c975

comments

e91b744

bpkroth commented Mar 19, 2024

View reviewed changes

bpkroth and others added 9 commits March 19, 2024 22:16

consistency

64e7575

Merge branch 'main' into trial-runner-abstraction

8a9e29e

fixup

32c01c0

schema tests

0e89e25

spelling

5549925

make sure trial_runner_id shows up by default

7feba3a

wip: fixups

cc7ed4d

fixme comments

8d794f1

Launcher args fixups

967b6e2

bpkroth mentioned this pull request Mar 21, 2024

Fixups and testing for cli config file parsing #722

Draft

motus reviewed Mar 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Introduce TrialRunner Abstraction #720

WIP: Introduce TrialRunner Abstraction #720

bpkroth commented Mar 19, 2024 •

edited

Loading

bpkroth Mar 19, 2024

bpkroth Mar 19, 2024

bpkroth Mar 19, 2024

bpkroth Mar 19, 2024

bpkroth Mar 19, 2024

bpkroth Mar 19, 2024 •

edited

Loading

motus Mar 20, 2024

motus Mar 20, 2024

		# pylint: disable=too-many-statements
		# pylint: disable=too-complex

	# pylint: disable=too-many-statements
	# pylint: disable=too-complex
	# pylint: disable=too-many-statements,too-complex

	for trial_runner_id in range(0, self.global_config["num_trial_runners"]):
	for trial_runner_id in range(self.global_config["num_trial_runners"]):

WIP: Introduce TrialRunner Abstraction #720

Are you sure you want to change the base?

WIP: Introduce TrialRunner Abstraction #720

Conversation

bpkroth commented Mar 19, 2024 • edited Loading

bpkroth Mar 19, 2024

Choose a reason for hiding this comment

bpkroth Mar 19, 2024

Choose a reason for hiding this comment

bpkroth Mar 19, 2024

Choose a reason for hiding this comment

bpkroth Mar 19, 2024

Choose a reason for hiding this comment

bpkroth Mar 19, 2024

Choose a reason for hiding this comment

bpkroth Mar 19, 2024 • edited Loading

Choose a reason for hiding this comment

motus Mar 20, 2024

Choose a reason for hiding this comment

motus Mar 20, 2024

Choose a reason for hiding this comment

bpkroth commented Mar 19, 2024 •

edited

Loading

bpkroth Mar 19, 2024 •

edited

Loading